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BIALLEUC MARKERS FDR USE IN CONSTRUCTING A HIGH DENSITY DISEQUILIBRIUM MAP OF THE HUMAN GENOME 



Background of the Invention 

Rccpnt advances in genetic enginccrinQ and biomfornintics have enabled the manipulation and charnctdrization 
of large portions of tho human Qcnaino. Wulc efforts to obtain the full sequence of the human genome are rapidly 
progressing, there arc many practical uses for genetic information which can be implemented with partial knowledge of 
tho sequence of the human genome. 

As the full sequence of tho human genome is assembledr the partial sequence information available can be used 
to identify genes responsible for detectable human traits, such as genes associated with human diseases, and to develop 
diagnostic tests capablo of idcntifyino individuals who c:(prfiss a detectable trait as the result of a specific genotype or 
individuals whose genotype places them at risk of develaptng a detectable trait at a subsequent time. Each of these 
epplications for partial genomic sequence inforniotton is based upon the assembly of genetic and physical maps which 
order the known genomic sequences along the human chromosomes. 

The present invention relates to human genomic sequences which can be used to construct a high resolution 
map of the human genome^ methods for constructing such a map, methods of identifying genes associated with 
detectable human traits, and diagnostics for identifying individuals who carry a gene which causes them tu express a 
detectable trait or which places them et risk of expressing a detectable trait in the future. 

Sumfnarv of the Invention 

A first embodiment of the present invention is a method of obtaining a set of biaileOc markers comprising the 
steps of obtaining a nucleic acid library comprising a plurality of genomic DNA fragments comprising the full genDme or a 
portion thereof, determining the order of said plurality of genomic DNA fragments in the genome, determining the 
sequence of selected regions of said plurality of genomic DNA fragments, and identifying nucleotides in said plurality of 
genomic DNA fragments which vary between individuals, thereby defining a set of biallelic markers. 

In one aspect of this first mbodiment, the identifying step comprises identifying about 20,000 biallelic 
markers. In another aspect of this first embodiment the identifying step comprises identifying about 40,000 biallelic 
markers. In a further aspect of this embodiment, the identifying step comprises identifying about 60,000 biallelic 
markers. In still another aspect of this first embodiment, the identifying step comprises identifying about 80,000 
biallelic markers. . In still another aspect of this first embodiment the identifying step comprises identifying about 
100,000 biallelic markers. . In still another aspect of this first embodimen the identifying step comprises identifying 
about 120,000 biallelic markers. 

In still another aspect of this first embodiment, the biallelic markers are separated from one another by an 
average distance of 10kb'200 kb. . In still another aspect of this first embodiment the biallelic markers ate separated 
from one another by an average distance oi 15kb-150 kb. In still another aspect of this first embodiment the biallelic 
markers are separated from one another by an average distance of 20kb-100 kb. . In still another aspect of this first 
embodiment the tiallelic markers are separated Irom one another by an average distance of 10Qkb-1?D kb. In still 
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another aspect of this first embodiment the biallelic markers are separated from one another by an average distance of 
50-100kb. . In still another aspect of this first embodiment, the biallelic markers are separated from one another by an 
average distance of 25 kb*5G kb. 

In still another aspoct of this first embodiment, the step of determinina the sequence of selected regions of 
said plurality of Qcnomic DNA fragments comprises inserting fragments of said plurality of genomic DNA frpijmunts into 
a vector to generate a plurality of subclones and determining the sequence of a region of the inserts in said plurality of 
subclones or a subsot thereof. For example, in this aspect of the first embodiment, the step of dcterminiflg the sequence 
of a region of said inserts or a subset thereof may comprise determining the sequence uf une or both end regions of said 
inserts or a subset thereof. In this aspect of the first Bmbodimiml, the step of detennining the sequence of one or boih 
end regions of said plurality of subclones comprises determining the sequence of about 500 bases at each end of said 
subclones or a subset thereof. 

In still another aspect of this first embodiment, a set of about 10,000 to about 20.000 genomic DNA inserts 
with an average size between lOOkb and 300kh are ordered. In still another aspect of this first embodiment, a set of 
about 10,000 to about 30.000 genomic DNA inserts with an overage size between 100kb and 150 kb are ordered. In 
still another aspect of iliis first embodiment, a set of about 15,000 to about 25,000 genomic DNA inserts with an 
average size between IDOkb and 200 kb are ordered. 

In sliil another aspect of this first embodiment, the identifying step comprises identifying beiwean 1 and 6 
biallelic markers per genomic DNA fragment. In still another aspect of this first embodiment, the identifying step 
comprises identifying an average of 3 biallelic markers per genomic DMA insert 

In still another aspect of this first embodiment the Qcnomic DNA fragments arc in a Bacterial Artificial 
Chromosome, In still another aspect of this first embotfment the genomic DNA fragments are in a Yunst Artificial 
Chromosome. 

In still another aspect of this first embodiment, the method further comprises determining the position of said 
biaOefic markers along the genome or a portion thereof. In this aspect of the first embodiment, the step of detennining 
the position of said tiolleric markers along the genome or portion thereof may comprise determining the position of said 
biallelic markers along a chromosome. In this aspect of the first embodiment, the step of determining the position of 
said biallelic markers along the genome or portion thereof comprises determining the position of said biallelic markers 
along a subchromosomal region. 

In still another aspect of this first embodiment, the method further comprises identifying biallelic markers 
which are in linkage disequilibrium with one another. In this aspect of the first embodiment the method may further 
comprise optimizing the intemiarker spacinj) between said biallelic markers such that each identified marker is in linkage 
dlseqinllibrium with at least one other Identified marker. 

In still another aspect of this first embodiment, the portion of the genome comprises at least 2Q0 kb of 
contiguous genomic DNA, In still another aspect of this first embodiment, the portion of the genome comprises at least 
300 kb of contiguous genomic DNA, In still another aspect of this first embodiment, the portion of the genome 
compiisBs et least "SODkb of contiguous genomic ONA. In still another aspect oT this first embodiment, the portion of the 



wo 99/04038 



PCT/IB98/0n93 



-3- 

genome comprises at least 2 Mb of contiguous genomic DNA, In still another aspect of this first Gmhodimcnt. the portion 
of the genome comprises at least 5 Mb of cont^uous genomic DNA. In still another aspect of this first cmbodimont, the 
ponion of the genome comprises at least 10 Mb of contiguous genomic DNA. In still anollicr aspect of this first 
embodiment, the portion of the genome comprises ot teast 20 Mb of contiguous genomic DNA. 

Ill slin another aspect of this first embodiment, the method further comprises the step uf identifyinu one or 
more groups of biallelic markers which are in proximity to one anolluir in the genome. In this aspect of the first 
embodimont. the biallolic markers in each of these groups may ho located within a genomic region spanning less than 
Ikb. Alternatively, in Uiis aspect of the first embodiment, the biallelic markers in each of these groups may be located 
within a genomic region spanning from 1 to 5kb. Altcmatwely, in this aspect of the first embodiment, the biallelic markers 
in each of these groups may be located within a genomic region spanning from 5 to lOkb. Altcmotivoly, in this aspect of 
the first embodiment, the biallelic markers in each of these groups may be located witliin a genomic region spanning from 
10 to 25kb. Alternatively, in this aspect of the first embodiment the biallelic markers in each of these groups may be 
located within a genomic region spanning from 25 to 5Qkb, Alternatively, in this aspect of the first embodiment, the 
bialleiic markers in each of these Dcoups may be located within a genomic region spanning from 50 to 150kb. 
AUcmativcly, in this aspect of the first cmbodinwnt* the biallcfic markers in each of these groups may be located within a 
genomic region spanning from 150 to 25akh. Alternatively, in tJiis aspect of the first embodiment, the biallelic markers in 
Bach of these groups may be located within a genomic region spanning from 250 to 500kb. Alternatively, in this aspect ol 
tho first embodiment, the biallelic markers in each of these groups may be located within a genomic region spanning from 
500kb to 1Mb. Allemativdy, in this aspect of the first embodiment, the biallelic markers in each of these groups may be 
located within a genomic region spanning more than 1Mb. 

A second embodiment of the present invention is a method of obtaining a set of biallelic markers comprising the 
steps of obtaining a nucleic add library comprising genomic DNA fragments comprising the full genome or a portion 
thereof, determining the sequence of selected regions of said genomic DNA fragments, identifying nucleotides in said 
genomic DNA fragments wliich vary between indhfiduals, thereby defining a set of biallelic markers, and 
determining the order of said biallelic markers along the genome or portion thereof. 

A th'ffd en^odunent of the present invention is a set of biallelic markers obtained by the method of the first 
embodiment In one aspect of this third embodiment the markers in said sat have a known genomic position, In another 
aspect of this third embodimeftt, the markers in said set have a known genomic relationship to one another. 

A fourth embodiment of the present invention is a set of bianelic markers having a known relationship to one 
another and a known genomic position, said set of biallelic markers being obtained by the method of the first 
embodiment. In one aspect of this fourth embodiment, the bialleRc markers have heterozygosity rates of at least about 
0,1 B. In another aspect of this fourth embodiment the biallelic markers have heterozygosity rate ol at least about 0.32. 
In still another aspect of this fourth embodiment the biallelic markers have a heterozygosity rate of at least about 0.42. 

A fifth embodiment of the present invention is a map comprising an ordered array of at least 20,000 biallelic 
markers obtained by the method of the first embodiment. In one aspect of this fifth embodiment, the map comprises an 
ordered array of at least B0.000 biallelic markers obtained by the method of the firsi embodiment. In another aspect of 
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this fifth embodiment the map comprises an ordeied anay of at least 120,000 biallclic markers obtained by the method 
of the first embodiment. 

In another aspect of this fifth cmbodimom, biollciic markers are distributed at an avcraoe marker density ol 
one marker every 150kb. In a furtJicr aspect of this fifth embodiment, the biallclic markers are distributed nl an average 
marker density of one marker every 50 kb. In a further aspect of this fifth embodiment, the biallelic markers are 
distributed ot an average marker density of one marker every 25 kb. 

A sixth embodiment of the present invention is a method of identifying one or more biallelic markers associated 
with a detectable trait comprising the steps of determininglhe frequencies of each allele of one or more biallelic 
markers obtained by the method of tho first embodiment in individuals whu express said detectable trait and individuals 
who do not eipress said detectable trait, and identifying one or more alleles of said one or mora biallelic markers which 
are statistically associated with the expression of said detectable trait. In one aspect of this sixth embodiment, the 
detectable trait is selected from the ofoup consisting of disease, diuo response, drug efficacy, and drug toxicity. In 
another aspect of this sixth embodiment, the phcnotypc of said individuals who express said detectable trait and the 
phenotype of said individuals who do not express said detectable trait are readily distinguishable from one another. In 
stilt another aspect of this sixth embodiment the individuals who eipress said detectable trait and the individuals vuhu do 
not express said dctectabta trait are selected from a bimodal phcnotypc distribution. In stilt another aspect of this sixth 
embodiment, the individuals who express said detectable trait are at one phcnotypic extreme of tlic population and said 
individuals who do not express said detectable trait arc at the other phenotypic extreme of the population. 

A seventh embodiment of the present invention is a method of identifying a hapiotyps associated with a trait 
comprising the steps of obtaining nucleic acid samples from trait positive and trait negative individuals, determining 
the frequencies of the alleles of each member of a group of biallclic markers obtained by the method of the first 
embodiment which are known to be located proximity to one another in the gonomc in said nucleic acid samples, and 
identifying a plurality of alleles of biallelic markers having a statistically significant association with said trait In one 
aspect of this seventh embodiment, the detectable trait is selected from the group consisting of disease, drug response, 
drug efficacy, and drug toxicity. 

In another aspect of this seventh embodiment, the faiallefic markers in each of these groups are located within 
a genomic region spanning less than Ikb. In still another aspect of this seventh embodiment, the biallelic markers in each 
of these groups are located within a genomic region spanning from 1 to 5kb« In still another aspect of this seventh 
embodiment, the biaflelic markers in each of these groups are located within a genomic region spanning from 5 to 1Dkb. . 
In stil another aspect of this seventh onbodiment the biallclic markers in each of these groups are located within a 
genomic region spanning from 10 to 25kb. . in still another aspect of this seventh embodiment, the biallelic markers in 
each of these groups are located within a genomic region spanning from 25 to 50kb. In still another aspect of this sovenlh 
embodiment, the biallelic markers in each of these groups are located within a genomic region spanning from 50 to 
15Qkb. . In still another aspect of this seventh embodiment, the biallelic markers in each of these groups are located 
within a genomic region spanning from 150 to 25Qkb. In still another aspect of this sev^th embodiment, the biallelic 
markers in each of these groups are located within a genomic region spanning from 25Q to 500kb. In still another aspect 
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of this seventh Bmbodimcnt, the biallelic markers in each of these groups are located within a genomic region spanning 
from BOOkb to 1Mb. In still another aspoct of this seventh cmbodinicnt, the biallelic markers in each of these groups are 
located within a genomic region spanning more than 1 Mb. 

An eighth embodiment of the present invention is a methoil of identifying one or more biallGlic markers 
associated with a detectable trait comprising the steps of selectino a genB in which mutations result in a detectiiblG trail 
or a gene suspected of being associated with a detectable trait end identifying one or more biallenc markers oblainiii] by 
the method of Claim 1 within the genomic region liarboring said gene whidi ore associated with said detectable trait. In 
one aspect of this eighth embodiment, the detectable trail is selected from the group consisting of disease, drug 
response, drug efficacy, and drug toxicity. In another aspoct of tliis eighth embodiment, the idcntifyuig step comprises 

determining the frequencies of said one or more biallelic markers in individuals who express said delectable 
trait and individuab who do not express said detectable trait and identifying one or more biallelic markers which arc 
statistically associated with the expression of said dotectablc trait. 

A ninth embodiment of the present invention is an array of nucleic acids fixed to a support, said nucleic Dcids 
comprising at least 8 consecutive nucleotides, including the polymorphic nucleotide, of one or more biallelic markers 
obtained by the method of the first embodiment In one aspect of this ninth embodiment, the nucleic acids comprise at 
least 15 consBCUlivB nucleotides, including ihe polymorphic nucleotide, of at least fWc biallelic markers obtainud by the 
method of the first embodiment. In another aspect of this ninth embodiment 

the nucleic acids comprise at least 8 consecutive nucleotides, including the polymorphic nucIealidB, of at least ten 
biallelic markers obtained by the method of the first embodiment 

A tenth embodiment of the present invention is an array of nucleic acids fixed to a support, said nucleic acids 
comprising at least 8 consecutive nucleotides, including the polymorphic nucleotide, of one or more groups of biallelic 
markers known to be located in proximity to one another in the genome. 

An eleventh embodiment of the present invention is an array of nucleic acids fixed to a support, said nucleic 
acids comprising amplification primers for generating an amplification product comprising at least 8 consecutive 
nucleotides, including the polymorphic nucleotide, of one or more biallElic markers obtained by the method of the first 
embodiment 

A twelfth embodiment of the present invnetion is an array of nucleic acids fixed to a support, said nucleic acids 
of comprising amplificalion primers for generating an amplification product comprising at least 15 consecutive 
nucleotides, including tlie polymorphic nucleotide, of one or more groups of biallelic markers known to be located in 
proximity to one another in the genome. 

A thirteenth embodiment of the present invnetion is an array of nucleic acids fixed to a support, said nucleic 
acids comprising one or more microsequencing primers for determining the Wentity of the polymorphic base of one or 
more nucleic acids comprising at least 15 consecutive nucleotides, including the polymorphic nucleotide, of one or more 
biaileGc markers obtained by the method of the first embodiment 
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A fourteenth embodiment gf th8 present invsnlion is an array of nucleic acids fixed to 3 support, said nucleic 
nucleic acids ccmprising one or more nucrosequDncinQ primers for dctGnnining the idcnliJy of the polymorphic bases of 
one or more groups of biallclic markers known to bo located in proximity to one another in the genome. 

A fiftcenlh embodiment of the present invcniion is an nrray of nucleic acids fixed to a support wherein said 
nucleic acids arc complementary 10 one or more microsequencing primers for determining the identities ol the 
polymorphic bases of one or more biallefic markers obtained by iho method of the First emUodimenL In one aspect of 
this fifteenth embodiment, the nucleic acids arc complementary to ai least five microscquBncing primers for determining 
the identities of the polymorphic bases of at least five biallelic muikers obtained by tlie method of Ik first embodiment. 
In another aspect ol tfiis fifteenth embodiment, the oudcic acids arc complementary to at least ten microsequencing 
primers for determinino llic identities of the polymorphic bases of at least ten biallelic markers obtained by the nmthod 
of the ftrst embodiment. 

A sixteenth embodiment of the present invention is an array of nucleic acids fixed to a support, said nucleic 
acids comprising one or more nucleic acids complementary to one or more microsequencing primers for dcleiinining the 
identity of the polymorphic bases of one or more groups of biallelic markers known to be located in proximity to one 
anotticr in the genome. 

Another aspect of the present invention is an array of any one of tlie tenth, twelfth, fourteenth or sixteenth 
embodiments, wherein the members of each of said one or more groups of biallelic markers are located in pliysical 
proximity to one another on said support . 

Another aspect of the present invention is an array of any one of Claims of liic tenth, twelfth, fourteenth or 
sixteenth embodiments, wherein said biallelic markers in each of these groups arc located within a genomic region 
spanning less than Ikb. 

Another aspect of the present invention is an array of any one of of the tenth, twelfth, (ourtccntli or sixteenth 
embodiments, wherein said Wallelic markers in each of these groups are located within a genomic region ^paoning from 1 
to Skb. 

Another aspect of the present invention is an anay of any one of of the tenth, twelfth, fourteenth or sixteenth 
embodiments, wherein the biallelic markers in each of these groups are located within a genomic rcQion spanning from 5 
to lOkb. 

Another aspect of the present invention is an array of any one of of the tenth, twelfth, fourteenth or sixteenth 
embodiments, wherein the KaMc markers in each of these groups are located within a genomic region spanning from 
I0to25kb. 

Another aspect of the present invention is an array of any one of of the tenth, twelfth, fourteenth or sixteenth 
embodiments, wherein the biaDefic markers in each of these groups are located within a genomic region spanning from 
25to50kb. 

Another aspect of the present invention is an array of any one of ol the tenth, twelfth, fourteenth or sixteenth 
embodimems, wherein the biallelic maricers in each of these groups are located within a genomic region spanning from 
SOtolSOkb. 
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Another aspect of the present invention is an array of any one of of tho tenth, twelfth, fourteenth or sixteenth 
embodiments, wherBin the biallelic markers in each of these groups aie locoted within a genomic region spanning from 
150 to 250kb. 

Another aspect of tho present invention is an array of any one of of the tenth, twelfth, fourteenth or sixteenth 
embodiments, wherein the biaUclic markers in each ol lltese groups arc located within a genomic region spanning from 
250 to 500kb. 

AnothCT aspect of the present invention is an array of any one of of the tenth, twelfth, fourteenth or sixteenth 
embodiments, wherein tho Wnllelic markers in each of these groups are located within a genomic region spanning from 
SOOkbtolMb. 

Another aspect of iha present mvention is an array of any one of of the tenth, twelfth, fourteenth or sixteenth 
embodiments, wherein the biallcOc markers in each of these groups are located within a genomic region spanning more 
than 1Mb. 

Another aspect of the present invention is an array of any one of of the tenth, twelfth, fourteenth or sixteenth 
cmbadiments, wherein each group of bianenc markers comprises at least 3 biallelic markers. 

Another aspect of the present invention is an array of any one of of the tenth, tvielfth, fourteenth or sixteenth 
embodiments, wherein each group of biallenc markers comprises at least 6 biallelic markers. 

Another aspect of the present invention is an array of ony one of of the tenth, twelfth, fourteenth or sixteenth 
embodiments, wherein each group of biallelic markers comprises at least 20 biallelic markers. 

A seventeenth embodiment of the present invention is a method for determining whether an individual is at risk 
of developing a detectable trait or suffers from a detectable trail associated with said trait comprising the steps of 
obtaining a nucleic acid sampte from said individual, screening said nucleic acid sample with one or more biallelic markers 
obtained by the method of the first embodiment, and detennining whether said nucleic acid sample contains one or more 
of biallelic markers statistically associated with said detectable trait i one aspect of this seventeenth embodiment, the 
detectable trait is selected from the group consisting of disease, drug response, drug efficacy and drug toxicity. In 
another aspect of this seventeenth emobnncnt the biallelic markers were obtained by the method of the sixth 
embodiment In another aspect of this seventeenth embodiment, the biallelic markers were obtained by the method of 
the eighth embodiment 

An eighteenth embodiment of the present invention is a method of using a drug comprising obtaining a nucleic 
acid sample from an individual, determining the identity of the polymorphic baso of one or more biallelic markers obtained 
by the method of the first embodiment which is associated with a positive response to treatment with said drug or one 
or mora biallefic markers obtained by tha method of the first embodiment which is associated with a negative response 
to treatment with said drug, and administering said drug to said individual if said nucleic acid sample contains one or 
more biallelic markers associated with a positive response to treatment with said drug or if said nucleic acid sample 
lacks one or more biallelic markers associated with a negative response to said drug, in one aspect of this eighteenth 
©nbodiment, the determining step comprises determining the identity of the polymorphic base of one or more biallelic 
markers obtained by the method of the aspect of the sixth embodiment wherein the trait is drug response which is 
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associated with a poshive response to trealment with said drug or one or more hiallelic markers obtained by the aspect 
of the sirth embodiment wherein the trail i$ drug response which is associated with a neoative response to treatment 
with said drufl. In another aspect of this eighteenth cmbodimenl the dcicrminino step comprisos determining the 
identity of the polymorphic base of one or more biallGlic markers obtained by the aspect of the eiulilh embodiment 
wheicin the trait b drug response which is associated with a positive response lo treatment with said druQ or one or 
more biallelic markers obtained by tlie method of the aspect of the eighth embodiment wherein the trait is ihnB response 
wliich is associated with a negative response to treatment with said drug. 

A ninoteenth cinhodimonl of the present invention is a method of selecting an individual for inclusion in a 
clinical trial of a drug comprising obtaining a nucleic acid sample from an individual, dctcmiinino the identity of the 
polymorphic base of one or more biallolic markers obtained by the method of the first embodiment which is associated 
with a positive response to treatment with said drug or one or more biallulic markers associated with a negative 
response to treatment with said drug in said nucleic odd sample, and including said individual in said cOnical trial if said 
nucleic acid sample contains one or more biallelic markers obtained by tlie mcihod of the first embodiment which is 
associated with a positive response to treatment with said drug or if said nucltiic acid sample lacks one or more biallelic 
markers associated with a negative response to said drug. In one aspect of this nineteenth embodiment, the determining 
step comprises determining the identity of the polymorphic base of one or more biallelic markers obtained by the aspect 
of the sbtth embodiment wherein the trait is drug response which is associated with a posith^e response to treatment 
with said drug or one or more biallelic markers obtained by the aspect of the sixth embodiment wherein the trait is drug 
rcspons which is associated with a negative response to treatment with said drug. In another aspect of this mneteenth 
embodimont the determining step comprises determining the idcniity of the polymorphic base of one or more biallelic 
markers obtained by the aspect of the.eighth embodiment wherein the trail is drug response v/hich is associated with a 
positive response to treatment with said drug or one or more biallelic markers obtained by the aspect ol the eighth 
embodiment wherein the trait is drug response which is associated with a negative response to treatment with said 
drug. 

A twentieth embodiment of the present invention is a method of identifying a gene associated with a 
detectable trait comprising the steps of deteraiining the frequency ol each allele of one or more biallelic markers 
obtained by (he method of the first embodiment in individicis having said detectable trait and individuals lacking said 
detcctahia trait identifying one or more alleles of one or more biallelic markers having a statistically significant 
association with said detectable trait, and identifying a gene m linkage disequilibrium with said one or more alleles, 
in one aspect of this twentieth embodiment, the method further comprises identifying a mutation in the gene which is 
associated with said detectable trait. In another aspect of this twentieth embodiment, the detectable trait is selected 
from the group consisting of disease, drug response, drug efficacy, and drug toxicity. 

A twenty-first embodiment of the present invention is a method of identifying a gene associated with a 
detectable trait comprising selecting a gene suspected of being associated with a detectable trail and identifying 
one or more biaHelic markers obtained by the method of the first embodiment within the genomic region harboring said 
gene which are associated with said detectable trait. In one aspect of this twenty-first embodiment, the detectable trait 
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is selected Irom thB group consisting of disease, drug response, drug efficacy, and drutj tDxicity. In another aspect of 
this twcoty-firsl emhodimcni, the identifying step comprises determining the frequencies of said one or more bioHciic 
markers in individuals who express said detectable trait and individuals who do not express said detectable trait end 

identifying one or more biallclic morkcrs which arc statisticolly associated with the expression of s;ml 
detectable trait 

A twenty-second embodimenl of the present inventiun is a metfiud of identifyitio a haplotype associated with 
a trait compi ising the steps of obtaining nucleic acid samples frojii trait positive and trait negative individuals, 

conducting an amplification reaction on said nucleic acid samples using ampliiicatiuii primers capable of 
generating amplification pioducts containing the polymorphic bases of a plurality of biallollc markers, contacting one or 
mora arrays according to the tenth embodiment with said amplificotion products, determining the identities of the 
polymorphic bases of said amplification products, and idcmifying a haplotype having a statistically signilicant 
association with said trait 

A twenty-third embodiment of the present invention is o method of identifying a haplotype associated with a 
trait comprising the steps of obtaining nucleic acid samples from trait positive and trail negative individuals, conducting 
amplification reactions on said nucleic acid samples using amplification primers capable of generating anijilification 
products containing the polymorphic bases of a plurality of biallulic markers, contacting one or more arrays accordinij tn 
tfw fourteenth embodiment with said amplification products, conducting microscquencing reactions on said 
amplification products using microscquencing primers on said arrays, thereby generating elongated microsBquencing 
primers comprising the polymorphic bases of said amplification products, determining the identities of said pulymorphic 
bases, and identifying a haplotype having a statistically signilicant association with said trait 

A twenty-fourth embodiment of the present invention is a method of identifying a haplotype associated with a 
trait comprising the steps of obtaining nucleic add samples from trait positive and trait negative individuals, conducting 
amplification reactions on said nucleic add samples uisng amplification primers which arc capable of generating 
ampRfication products containing the polymorphic bases of a plurality of bialielic markers, conducting microscquencing 
reactions on said nucleic acid samples, thereby generating microscquencing products containing the polymorphic bases 
of one or more bialleiic markers at tfieir T ends, said polymorphic bases being detcctably labeled, contacting one or more 
arrays according to the sixteenth embodiment whh said microsequencing products such that said microscquencing 
products specifically hybridize to said nucb'c acids complementary to said microsequencing primers, determining 
the identities of the polymorphic bases of said microsequencing products, and identifying a haplotype having a 
statistically significant association with said trait* 

A twenty-fifth embodiment of the present invention is a method of identityino a haplotype associated with a 
trait comprising the steps of obtaining nucleic acid samples from trait positive and trait negative Individuals, contacting 
one or more arrays according to the twelfth embodiment with said nucleic acid sample, conducting on amplification 
reaction on said nucleic acid samples using amplification primers on said array which are capable of generating 
emplification products containing the polymorphic bases of a plurality of biallefic markers, determining the identities of 
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the polymorphic bases of said omplificaiion products, and identifying a haplotypc having a statistically significani 
association with said trait. 

A TwentY-sixth Bmbodlmflnt of the present Invention is a method of determining whether an individual is z\ risk 
of developing Alzheimer's diseaso or whether the Individual suffers from Alzheimer's disease as a result of possessing 
the Apo E e4 Site A allele comprising oblaining a nucleic acid sample from said individnnl, and determinina the identity 
of the polymorphic base in one or more of the sequences selected from the group consisting of SEQ ID Nos. 301-305 and 
SEQ ID Nos. 307-31 1 or the sequences complemcilary thereto in said nudaic acid sample. In one aspect of this twenty- 
sixth embodiment, the method further comprises determining whether said nucleic acid sample contains the snqimnce of 
SEQ ID No. 3DG or the sequence complementary iherato. In anoiiier aspect of this twcnty-siith embodiment, the step of 
determining the identity of the polymorphic bases in one or more of the sequences selected from the group consisting of 
SEC ID Nos. 301-305 and SEQ ID Nos. 307-311 or the sequences complementary thereto comprises determining 
whether said nucleic acid sample contains the sequence of SEQ ID NO. 311 (the T allele of marker 99-365;344) or the 
sequence complementary thereto. In another version of the preceding aspect, the further comprises detcrmininu whether 
said nucleic acid sample contains the sequence of SEQ ID No. 30G or the sequence complementary thereto. 

A twenty-seventh embodiment of the present invention is an isolated nucleic acid comprising a sequence 
selected from the group consisting of SEQ ID No. 301, SEQ ID No, 307, the sequences complementary thereto, and 
fragments comprising at least 8 consecutive nucleotides, including the polymorphic nucleotide, thereof. 

A twenty-eighlfi embodiment of the present invention is an isolated nucleic acid comprisinn a sequence 
selected from the group consisting of SEQ ID No. 302 , SEQ ID No. 308, the sequences complementary thereto, and 
fragments comprising at least 6 consecutive nucleotides thereof. 

A twenty-ninth embodiment of the present invention is an isolated nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID No. 301 SEQ ID No. 309, the sequences complementary thereto, and fragments 
comprising at least 8 consecutive nucleotides, including the polymorphic nucleotide, thereof. 

A thirtieth embodiment of the present invention b an isolated nucleic acid comprising a sequence selected from 
the group consisting of SEQ ID No. 304, SEQ ID No. 310 , the sequences complementary thereto, and fragments 
comprising at least 6 consecutive nucleotides, including the polymorphic nucleotide, thereof. 

A thirty first embodiment of the present invention is an isolated nucleic acid comprising a sequence selected 
from the oroup consisting of SEQ ID No. 305, SEQ ID No. 311, the sequences complementary thereto, and fragments 
comprisino at least 8 consecutive nucleotides, including the polymorphic nucleotide, thereof. 

A thirty second embodiment of the present invention is an isolated nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID Nos. 313*317, SEQ ID Nos. 319-323, and fragments comprising at least 8 
consecutive nucleotides thereof. 

A thirty third embodiment of the present invention is isolated nucleic acid comprising a sequence selected from 
the group consisting of SEQ ID Nos. 325-329, SEQ ID Nos. 331-335, the sequence complementary thereto, and 
fragments comprising at least 8 consecutive nucleotides thereof. 
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A thirty fourth embodiment of the present invention is set of nucleic acids comprising at least 8 consecutive 
nucleotides, including the polymorphic nucleotide, of one or more bialleiic markers obtained by the method of the first 
embodiment. 

A thirty fifth embatliment of the present invention is a set of nucleic acids comprising amplification primers for 
Qcnerating an amplificaiion proitoct comprising at least 8 consecutive nuclcotiiJes, includino the polymorphic nucieolido. 
of one or more bialleiic markers obtained by the method of the first embodiment, 

A thirty sixth embodiment of the present invention is a set of nucleic acids comprising one or more 
microsaqucncing primers for delemiining the iduntiiY of the polymorphic base of one or mure nucleic acids ciiniprising at 
least 8 consecutive nuclaoiides, including the polymorphic nucicolide, of one or more bialleiic markers obioineil by the 
method of the first embodiment. 

Brief Descrintinn nf tlig Drawings 
Figure 1 is a cytoocnetic map of chromosome 21. 

Figure 2a shows the results of a computer simulalinn of the distribution of inier-markcr spacing on a randomly 
distributed set of bialloiic markers indicating the percentaoe of bialleiic markers which will be spaced a given distance 
apart for 1, 2. or 3 markers/BAC in a genomic map (assuming a set of 20,000 minimally overlapping BACs covcrinu the 
genome are evaluated). 

Figure 2b shows the results of a computer simulation of the distribution of intcr-marker spacing on a randomly 
distributed set of bialleiic markers indicating the percentage of bialleiic markers which will ba spaced a given distance 
apart for 1, 3, or 6 markcrslBAC in a genomic map {assuming a set of 20.000 minimally overlapping BACs coverinn the 
genome are evaluated). 

Figure 3 shows, for a series of hypothetical sample sizes, the p-value significance obtained in associnlion 
studies performed using Individual markers from the liigh density bialleiic map, according to various hypotheses reu^rding 
the difference of allelic frequencies between the T + and T- samples- 
Figure 4 is a hypothetical assQCiation enalysis conducted with a map comprising obout 3,000 bialleiic markers. 
Rgurc 5 is a hypothetical association analysis conducted with a map comprising about 20,000 bialleiic 

markers. 

Figure 6 is a hypothetical association analysis conducted with a map comprising about GO,OOQ bialleiic 

markers. 

Figure 7 is a haplotype analysis using bialleiic markers in the Apo E region. 

Figura 8 is a simulated haplotype analysis using the bialleiic markers in tlie Apo E region included in the 
haplotype analysis of Figura?. 

Figure 9 shows a minimal anay of overlapping clones which was chosen for further studies of bialleiic markers 
associated with prostate cancer, the positions of SIS markers known to map in the candidate genomic region along the 
contig, and the locations of bialleiic markers along the BAG conttg harboring a genomic region harboring a candidate gene 
associated with prostate cancer which were identified using the methods of the present invention. 
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fiflure 10 is 8 rouoh localization of a candidate gene (or prostate cancer which was obtained by delermining 
the frequencies of the biallclic markers of Figure 9 in affected and unaffected populations. 

Figure 1 1 is a further refinement of the localization of the candidate ocnc for prostate cancer usimj nilditional 
biallclic markers which were not included in the rough localization illustraled in Rgurc 10. 

Figure 12 is a haplotype analysis using the biallelic markers in the genomic region of the gene associoted with 
prostate cancer. 

Figure 13 is a simulated haplotypc using the six markers included in liaplotypa 5 of Rgurc 11 

Detailed Descriplinn nf the Preferred Fmhndiment 
The human haploid genome contains an estimated 80,000 to 100,000 or more genus scattered on a 
3x10^ base^long double stranded DNA shared among the 24 cliromosomes. Each Imman being is diploid. U\ possesses 
two haploid genomes, one from paternal origin, the other from maternal origin. The sequence of the human genome 
varies among individuals in a population. About lO' sites scattered along the 3x10^ base pairs of DNA are polymorphic, 
existing in at least two variant forms called alleles. Most of these polymorphic sites arc gencralcd by single base 
substitution mutations and are biallclic. Loss than 10^ polymorphic sites arc due to more complex changes and are very 
often multi-allcfic i.e. exist in more than two allelic forms. At a given polymorphic site, any individual (diploid), can be 
either homozygous (twice the same allele) or heterozygous (two different alleles). A given polymorphism or rare muiaiion 
can be either neutral (no effect on trail), or functional. />. responsible for a paaicular genetic trait. 

Genetic Mans 

The first step towards the identification of genes associated with a detectable trait, such as a disease or any 
other detectable trait, consists in the localization of genomic regions containing irait-causing genes using genetic 
mapping methods. The preferred traits contemplated within the present invention relate to fields of therapeutic intercst; 
in particular emliodiments, they will be disease traits and/or drug response trails, reflecting drug efficacy or toxicity. 
Traits can either be "binary", e.g. diabetic vs. non diabetic, or "quantitative", e.g. elevated blood pressure. Individuals 
affected by a quantitative trait can be classified according to an appropriate scale of trait values, e.g. blood pressure 
ranges. Each trait value range can then be analyzed as a binary trait. Patients showing a trait value within one such 
range will tie studied in comparison with patients showing a trait value outside of this range. In such a case, genetic 
analysis methods will be applied to subpopulations of individuals showing trait values within defined ranges. 

Genetic mapping involves the analysis of the segregation of polymoiphic loci in trait 
positive and trait negative populations. Polymorphic loci constitute a small fraction of the human 
genome (less than \%), compared to the vast majority of human genomic DNA which is identical in 
sequence among the chromosomes of difTerent individuals. Among all existing human polymorphic 
loci, genetic markers can be defined as genome-derived polynucleotides which are sufficiently 
polymorphic to allow a reasonable probability tliat a randomly selected person will be heterozygous, 
and thus informative for genetic analysis by methods such as linkage analysis or association studies. 
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A genetic map consists of a collection of polymorphic markers wiiicli have been positioned on the human 
chromosomes. Genetic maps may be combined with physical mops, colleclions of ordered overlapping fragments of 
genomic ONA vtfhose arrangement along the human chromosomes is knovyn. The optimal genetic mnp should possess 
the following characteristics: 

- the density of the genetic markers scattered along the gcnonie should be sufficient to allow the identiliciilion and 
localization of any trait-related polymorphism. 

- each marker should have an adequate level of heterozygosity, so as to be informative in a large perccnlaoe of different 
mciosss, 

. all markers should be easily typed on a routine basis, at a reasonable expense, and in a reasonable amount of time, 

- the entire set of markers per chromosome should bo ordered in a highly reliable lashiun. 

However, while the above maps are optimal, it will be appreciated that the maps of the present invention may 
be used in the tluj individual marker and haplotype association analyses described below without the nccossily of 
determining the order of bialleiic markers derived from a single CAC with respect tn one another. 

Genetic l\;iaDS Based on RFLPs or VNTRs 
niQ analysis of DNA polymorphisms has relied on the following types of polymorphisms. The first generation 
of genetic markers were restriction fragment length polymorphisms (RFLPs), sinylc nuclootida polymorphisms wliich 
occur at restriction sites, thereby modifying the cleavage pattern of the corresponding rcsliiction enzyme. Though the 
original methods used to type BFLPs were material-, effort- and timeconsumtng, today these markers can easily be 
typed by PCR-based technologies. Since they are bialleiic markers (they present only two alleles, the restriction site 
being cither present or absent), their maximum heterozygosity is 0.5. The theoretical number of RIlPs distributed along 
the entire human genome is more than to'' , which leads to a potential average inter-marker distance of 30 kilobases, 
However, in reality the number of evenly distributed RFU's which occur at a sufficient frequency in the population to 
make them useful for tracking of genetic polymorphisms is very Rmited. 

The second generation of genetic markers was VNTRs (Variable Number of Tandem Repeats), which can bn 
categorized as either minisaleilitcs or microsatellites, IVlinisateHitBS are tandemly repealed DNA sequences present in 
units of 5-50 repeats which are distributed along regions of the human chromosomes ranging from 0.1 to 20 kilobases in 
length. Since they present many possible alleles, their polymorphic informative content is very higlt Minisatollites are 
scored by performing Somham blots to identify the number of tandem repeats present in a nucleic acid sample from the 
individual being tested. However, there are only 10^ potential VNTfls that can be typed by Southern blotting. 

Microsatellites (also called simple tandem repeat polymorphisms, or simple sequence length polymorphisms) 
constitute the most developed category of genetic markers. They include small arrays of tandem repeats of simple 
sequences (di-tri-tetra- nucleotide repeats) which exhibit a high degree of length polymorphism and thus a high level of 
informativeness. Slightly mora than 5,000 microsatellites easily typed by PCR-derived technologies, have been ordered 
along the human genome (Oib at a!., Nature 380:152 (1996), the disclosure of which is incorporated herein by 
reference). 
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A number of these availafalo microsatellitcs were used to construct integrated physical and genetic maps 
containing less than 5,000 markers. For example, CEPH (Chumakov et oL, Nature VT. 175-298 11995) and Cohen et nl, 
NatuCB 36G: 690-701 (1993) , the disclosures of which arc inccrporatcd herein by reference), and Whitehead Institute 
Dnd 66nfithon (Hudson et al., 1995). constructed genetic and physical maps covering 75% to 95% of the human genome, 
based on 2500 ta 5000 microsatolGte markers. 

However, the number of easily typed informative markers in these maps was too small for the average 
disfante between informative markers to fulfill tlie above-fated requirements for genetic maps. 

0[allelic Markers 

Diallclic markers are gcnome-durived polynucluutides which exhibit biallelic polymorphism. As used herein, tlie 
term biallelic marker means a biallelic single nucleotide polymorphism. As used herein, the term poiymorphism may 
include a single base substitulion, insertion, or deletion. By liefinition, the lowest allele frequency of a biallelic 
polymorphism is 1% (sequence variants which show allele frequencies beluw 1% are called rare mutatiuris). There arc 
potentially more than 10^ biallelic markers which can easily be typed by rnutinc automated techniques, such as 
sequence- or hybridization-based techniques, out of which 10^ are sufficiently informative for mapping purposes. 
However, a biallelic marker will show a sufficient degree of informativencss for use in genetic mapping only if the 
frequency of its loss frequent attulc is not less than about 10% (i.e. a heterozygosity rate of at least 0.16) (the 
heterozygosity rate for a biaJclic marker is 2 P, (1-PJ . where P, is the frequency of allele a). Preferably, the frcquoncy 
of the less frequent allele of the biallelic markers in the present maps is at least 20% (i.e. a heterozygosity rote of at 
least 0.32}. More preferably, the frequency of tlie less frequent allele of the biallelic markers in the present maps is at 
least 30% (i.e. its hcterozygoshy rate is liighor than about 0.42). 

Initial attempts to construct genetic maps based on non RFIP biallelic markers have focused on identifyinu 
biallelic markers lying within sequence tagged sites (SIS), pieces of genomic ONA having a known sequence and 
averaging about 250 bases in length. More than 30,000 STSs have been identified and ordered along the genome 
(Hudson et aL &/fl/7ce 270:1 345-1 954 (1995); Schuter et aU Science 274:540-546 (1995), the disclosures of which 
are incorporated herein by reference). For example, the Whitehead Institute and Gencthon's integrated map contains 
15,086 STSs. 

These sequence tagged sites can be screened to identify polymorphisms, preferably Single Nucleotide 
Polymorphisms (SNPs), more preferably non RFLP biallelic markers therein. Generally polymorphisms are identified by 
determining the sequence of the STSs in 5 to 1 0 individuals. 

Wang et al. (Cold Spring harbor laboratory: Abstracts of papers pressmed on genome Mapping and 
sequencing jiA7 (May 14-18, 1997), the disclosure of which is incorporated herein by reference) recently announced the 
identification and mapping of 750 Single Nucleotide Polymorphisms issued from the sequencing of 12,000 STSs from 
the Whitehead|f\/\IT map^ in eight unrelated individuals. The map was assembled using a high throughput system based 
on the utilization of DNA chip technology available from Affymetrix (Chee et al.. Science 274:610-614 (1996), the 
disclosure of which is incorporated herein by reference). 
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However according to eipaimental data and siaiistrcol calculations, less than one out of 10 ol ali STSs 
mapped today will contain an informativo Smglc Nucleotide Polymorphism. This is primarily duo to the sliort length of 
existing STSs (usually less than Z50 bpl If one assumes lO^nfomiativc SNPs spraad alono the human genome, there 
would on average be one marker of interest every 3X10'/10^ i.e. every 3,000 hp. The probabiOty that one such marker 
is present on a 250 bp stTutch is thus less than IMQ- 

WIulo it couW produce a high density map, tlic STS opprondi based on currently existing markers docs not put 
any systematic effort into making sure that the markers obtained arc optimally distributed throughout the entire 
flcnDmo. Instead, polymorphisms are limited to tlioso locations for which STSs are available. 

The even distribution of markers along the chromosomes is critical to the future succuss of genetic analyses. 
In particular, a high density map having appropriately spaced markers is essential for conducting assodation studies on 
sporadic cases, aiming at identifying genes responsible for detectable traits such as those which are described beluw. 

As will be further axplaincd below, genetic studies have mostly relied in the past on a statistical approach 
called linkage analysis, which took advantage of microsatellite markers to study their inhcrrtance pattern within families 
from which a sufficient number of individuals presented the studied tiaiL Because of intrinsic limitations ol linkage 
analysis, wtuch will be further detailed below, and because these studies necessitate the rccruhment of adequntu family 
pedigrees, they are not well suited to the genetic analysis of all traits, panicufariy those for which only sporadic cases 
are available (eg. drug response traitsL or those which have a low pcncuance within the studied population. 

Association studies offer an alternative to linkage analysis. Combined with the use of a high density rnap of 
appropriately spaced, sufficiently informative markers, association studies, including linkage disequilibrium-based 
gcnoma wide associatinn studies,will enable the identification of most genes involved in complex trahs. 

The present invention relates to a method for generating a high density linkage disequilibrium-based genetic 
map of the human genome which will allow the identification ol sufficiently informative markers spaced at intervals 
which permit their use in identifying genes rcsponstble for detectable traits using genome-wide association studies and 
linkage disequilibrium mapping. 

Construction of a Physical Map 
The flrst step in constructing a high density genetic map of biallelic markers is the construction of a physical 
map. Physical maps consist of ordered, overlapping cloned fragments of genomic DNA covering a ponion of the genome, 
preferably covering one or all chromosomes. Obtaining a physical map of the genome entails constructing and ordering a 
genomic DNA fibrary. 

Physical mapping in complex genomes such as the human genome (3.000 Megabases) requires the construction 
of DNA libraries containing large inserts (on the order of 0.1 to 1 Megabasa). It is crucial that such libraries be easy to 
construct, screen and manipulate, and that the DNA inserts be stable and relatively free of chimerism. 

Yeast artificial chromosomes (YACs; Burke et aU SciencB 23&:BD6-812 (19B7), the disclosure of which is 
incorporated herein by reference) have provided an invaluable tool in the analysis of complex genomes since their cloning 
capacity is extremely high (in the Mb range). YAC libraries containing large DNA inserts (up to 2 Mb) have been used to 
generate STS-content maps of mdividual chromosomes or of the entire human genome (Chumakov et al. (1995). suprd; 
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Hudson ct al. 11995), supm Cohen et al. Nature 366: 698-701 (1993; Chumnkov cl aL N^turv 359:380-387 (1992); 
Gemmill et aU Naturt 377:293-319 (1995); Doggctt ct aU NaWa 377:33S365 (19951; the disclosures of which are 
incorporated herein by referencu). 

The present genetic maps may he constructed using currently available YAC genomic libraries such as llic 
CEPH human YAC Ubtary as a starting material (Chumakov el aL (1895), suprs), Alternalively, one may construct a 
VAC genoinic library as described in Chumakov et oL, 1995, the disclosure of which is incorporated herein by reference, 
or as described below* 

Once 3 YAC genomic library has been obtained, the gcnuimc DMA fragments llicrein are ordered. Ordering may 
be porformcti directly on the genomic DNA in the YAC library. However, direct ordering of YAC inserts is not picferred 
because YAC libraries often exhibit a high rale of chimerism (40 to 50%' of YAC clonus contain fragments from more 
than one genomic region), often suffer from clonal instabifity within iheir genomic DNA inserts, and require tudious 
procedures to manipulate and isolate tho insert DNA. Instead, it is preferable to conduct ths mapping and sequencing 
procedures required for ordering the genomic DNA in a system which enables the stable cloning of large inserts while 
being easy to manipulate using standard molecular biology tectmiqucs. 

Accordingly, it is preferable to clone tho genomic DNA into bacierial single copy plasmids, for example BACs 
(Bacterial Artificial Chromosomes), rather than into YACs. Bacterial artificial chromosomes arc well suited for use in 
ordering genomic DNA fragments. BACs provide a low rate of chimerism and fragment reanrangement, together with 
relative ease of insert isolation. Thus BAC libraries are well suited to integrate genetic, STS and cytogenetic 
information while providing direct access to stable, readily-sequcnceable genomic DNA. An example of bacterial artificial 
chromosome is the BAC cloning system of Shizuyo et ol, which is capable of stably propagating and mnintaining 
relatively large genomic DMA fragments (up to 300 kb long) as single-copy plasmids in EsaU (Shizuya et al.. Pmc. Natl. 
Acad, Sd USA 89:8794-8797 (1992). the disclosure of which is incorporated herein by re(ercnce), 

Example 1 describes the construction of a BAC library containing human genomic DNA. It will be appreciated 
that the source of the genomic DNA. the enzymes used to digest llie DNA, the vectors into which the genomic DNA is 
inserted, and the size of the DNA inserts which are cloned into said vectors need not bo identical to those described in 
Example 1 below. Bather, tho genomic DNA may be obtained from any appropriate source, may be digested willi any 
appropriate enzyme, and may be cloned into any suitable vector. Insert size may vary within any range compatible with 
the cloning system chosen and with the intended purpose of the library being constructed. Typically, using BAC vectors 
to construct DNA libraries covering the entire human genome, insert size may vary between 50kb and 300 kb, preferably 
IDOkband 200kb. 

ExampieJ. 
Constniction of a BAC library 
Three different human genomic DNA libraries were produced by cloning partially digested DNA from a human 
lymphoblasioid ce!) line (derived from individual 8445, CEPH famifies) into the pBeloBACll vector (Kim et al.. 
GBnomics 34:213-218 (198B), the disclosuro of which is incorporated herein by reference). One library was produced 
using a BamHI partial digestion of the genomic DNA from the lymphoblastoid cell line and contains 110,000 clones 
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having an average insert size of 150 kb (corresponding to 5 human hapioid genome equivalents). Another library was 
prepared from a Hindlll partial digest and conesponds to 3 human genome oquivoicnts with an average insert size of 
150kb. A third library was prepared from a Ndel partial digest and corresponds to 4 human genome equivalents with an 
average insert size of iSOkb. 

Alternatively, the genomic DMA may be inscrtcii into QAC vectors whkh possess both a high copy miinber 
origin of replication, whidi facilitates the isolation of tlic vector ONA, and □ low copy number origin of replication. 
Cloning of a genomic DMA Insert into the high copy number origin of repiicatiun uiactivates the origin such that clones 
containing a genomic insat replicate at low copy number. The low copy number of clones having a genomic insert 
therein permits the inserts to be stably maintained. In addition, selection procedures may be designed which enable low 
copy number plasmids (i.e. vectors having genomic inserts therein) to be selected. Such vectors and selection procedures 
arc described in the U.S. Patent Application entitled 'High TlirDiJ«hi)ut DNA Sequencing Vector" (GENSET.Ol 5A, Serial 
No. 051058,7461, the disclosure of which is incorporated herein by reference. 

It will be appreciated tliat the present methnds may be practiced using BAG vectors other than those of 
Shizuya et al. (1992, supra), or derived from those, or vectors other than BAG vcclers which possess the ohnvc- 
described characteristics. 

To construct a physical map of the genome from genomic ONA libraries, the library clones have to be ordered 
along the human chromosomes. In a prefeiTfid embodiment, a minimal subset of the ordered clones will then be chosen 
that completely covers the entire genome. 

For example the genomic ONA in the inserts of the above described BAG vectors ara ordered tjsing STS markeis whoso 
positions relative to one snother and locations olong the genome arc known using procedures such as those described 
herein. The STS markers used to order tfie BAG inserts may he tiic STS markers contained in the integrated mops 
described above. Alternatively, the STSs may be STSs which are not contained in any of the physical maps described 
above. In another embodiment, the STSs may be a combination of STSs included in the physical maps described above 
and STSs which are not included in the integrated maps described above. 

Tho BAG vectors are screened with STSs until there is al least one positive BAG clone per STS. Preferably, a 
minimally overlapping set of 10«000 to 30,000 BACs having genomic mserts spanning the entire human genome are 
identified. More preferably, a mmimally overlapping set of 10,000 to 30,000 BACs having genomic inserts of about 100- 
SOOkb in length spanning the entire human genome arc identified. In a preferred embodiment, a minimally overlapping set 
of 10,000 to 30,000 BACs having genomic inserts of about 100-150 kh in length spanning the entire human genome is 
identified. In a highly preferred embodiment, a minimally overlapping set ol 15,000 to 25,000 BACs having genomic 
inserts of about 100-200 kb in length spanning the cntiro human genome is identified. Alternatively, a smaller number of 
BACs spanning a set of chromosomes, a single chromosome, a particular subchromosomal region, or any other desired 
portion of tho genome may be ordered. The BACs may be screened for the presence of STSs as described tn Example 2 
below. 
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Exnmole 2 

Drdcrinn af a BAC Library: Sn nonino CInmiS wilh STSs 
The BAC library is scrBBncd with a set of PCn-typcahle STSs lo identify clones containinn the STSs. To 
faciUtate PCfl screening of several thousand clones, for eiamplu 200,000 clones, pools of clones aro prepared. 

Three-dimensional pools of the BAC libraries aie prepared as described in Cliumakov et ai. and are screened for 
the ability to generate an ainpUfication Iragtnent in amplification reactions conducted using primers derived from the 
ordered STSs, (Chumakovct al. (19951. supnfl A BAC library typically contains 200,000 BAC clones. Since the average 
size of each insert is 100-300 kb. Ihc overall skc of such a library is equivalent *lo the size of at least about 7 human 
genomes. This library is stored as an array of individual clones in 518 384 well plates. It can be divided into 74 primary 
pools (7 plates each). Each primary pool can than be divided fauo ^8 subpoals prepared by using a tluee-dimensionnl 
pooling system based on the plate, row and columnr address of each clone (more particularly, 7 subpools consisting of all 
clones residing in a giver microtilcr plate; IB subpools consisting of all clones in a given row; 24 subpools consisting of 
all clones in a given column). 

Amplification reactions ore conducted on the pooled BAC clones using primers specific for the STSs. For 
example, tlie three dimensional pools may be screened with 45.000 STSs whose positions relative to one another and 
locations along the genome are known. PrefCTably, lhe three dimensional pools are screened with about 30,000 STSs 
whose positions relative to one another and locations along the genome are known. In a highly preferred embodiment, 
the three dimensional pools are screened with about 20,000 STSs whose positions relative to one another and locations 
along the genome are known. 

Amplification products resulting from the amplification reactions are detected by conventional agarose gel 
electrophoresis combined with automatic image capturing and processing. PCH screening for a STS involves three 
steps: (1) identifying the positive primary pools; 12) lor each positive primary pool, identifying tlie positive plate, row and 
column 'subpools' to obtain the address of the positive clone; (3) directly confirming the PCR assay on the identified 
clone. PCR assays are performed with primers specifically defining the STS. 

Screening is conducted as follows. First BAC DNA containing the genomic inserts is prepared as follows. 
Bacteria containing the BACs are grown overniflhi at 37''C in 120>jI of LB containing chlor3mphcnical(l2 ^tg/mlK DNA 
ts extracted by the following protocol: 

Centrifuge 10 min at A^'C and 2000 rpm 

Eliminate supernatant and resuspend pellet in 120 pi TE 10-2 (Tris HCl 10 mM, EOT A 2 mM) 
Centrifuge 10 min at A^'C and 2000 rpm 

ECminate supernatant and incubate pellet with 20 fj] lyzozyme 1 mg/ml during 1 5 min at room temperature 
Add 20 pi protemasa K lOOA/g/ml and incubate 1 5 min at 60**C 
Add 8 /il DNAsa 2U///I and incubate 1 hr at room temperatura 
Add 100 ;/l TE 10-2 and keep at -80**C 
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PCH assays are performed using the fallowing protocol: 
Final volume 
BAD DNA 
MqCIj 
dNTP(cacW 
primer (each) 

AmpH Taq Gold DNA pnlymcrasa 
PGR buffer (lOx - 0.1 M TiisltCI pKB.3 0.5M KCI 

Tim amplification is pDrlanncd on a Genius II ihermocycler. After lieating at 95"C for 10 iniii, 40 cycles are perfonned. 
Each cycle comprisos: 30 sec at 95** C, 54°C for 1 min, and 30 sec at IVZ. For final elongation, 10 mid at IT't, end 
the amplification, PGR products ore analyzed on 1% agarose uui with 0.1 mg/ml ethidium bromide. 

Allernatively, a YAC (Yeast Artificial Chromosome) library can he used. The very large insert size, of the order 
of 1 megafaase, is the main advontage of the YAC libraries. The library can typically include about 33,000 YAC clones as 
described in Ghumakov el aL (1995, svpr^]. The YAC screening protocol may be the same as the one used for BAG 
screening. 

The known order of the STSs is then used to align the GAC inserts in an ordered array (contig) spanning the 
whole human genome. If necessary new STSs to be tested can be generated by sequencing the ends of selected BAC 
inscas. Subchromosomal localization of the BACs can be established and/or verified by fluorescence in situ hybridization 
(FISH), performed on metaphasic chromosomes as described by Cberif et al. 1990 and in Example B below. BAC insert 
size may be determined by Pulsed Reld Gel Electrophoresis after digestion with the restriction enzyme Notl. 

Rnoliy, a minimally overlapping set of BAG clones, with known insert size and subchromosomnl location, 
covering the entire genome, a set of chromosomes, a single chromosome, a particular subchromosomal region, or any 
other desired portion of the genome is selected from the DNA nbrary. For example, the BAC dones may cover at least 
100kb of contiguous genomic DNA. at least 250kb of contiguous genomic DNA, at least 500kb of contiguous genomic 
DNA, at least 2Mb of contiguoos genomic DNA, at least 5Mb of contiguous genomic DNA, at least 10Mb of contiguous 
genomic DNA, or at least 20Mb of contiguous genomic DNA. 

Identification of bialleric markers 
In order to generate polymorphisms having the adequate informative content to be used as biallebc markers for 
genetic mapping, the sequences of random genomic fragments from an appropriate number of unrelated individuals are 
compared. Genomic sequences to be screened for biallelic markers may be generated by partially sequencing BAC 
inserts, preferably by sequencing the ends of BAC subclones. Sequencing the ends of an adequate number of BAC 
subclones derived from a minimally overlapping array of BACs such as those described above will allow the generation of 
biallelic markers spanning the entire genome, a set of chromosomes, a single chromosome, a particular subchromosomal 
region, or any other desired portion of the genome with an optimized.inler-marker spacing. 



1.7 noM 

2mM 
200 ;;M 
2.9 ngl/zl 
0.05 umt///l 



Ix 
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Thus, portions of the BACs in the selected ordered array are then subcloned onit sequenced using, for example, 
the proceduras described below. 

EKampj.e_3 
SuhrinninQ of BACs 

The cells obtained from three Hicrs overnight culturo of each BAG clone arc treated by alkaline lysis using 
conventional techniques to obtain the BAC ONA containing the genomic DNA itiscns. After ccnlrifugation of tin; BAG 
DNA in a cesium chloride gradient, ca SO/yg of BAC DNA ore purified, S-IOjyg of BAC DNA arc sunicated using Uifce 
distinct conditions, to obtain fragments Vtfithin 3 desired size ronge. The obtained DNA fragments are end-repaired in a 
50 ij\ vulume with two units of Vent polymerase for 20 min at 70° C, in tiic presence of the four deoxyiriphosphotes 
(lOO/iM). The resulting blunt-ended fragments ore separated by electrophoresis on preparative low-mcltiny point 1% 
agarose gels <60 Volts for 3 hours). The fragments lying witliiii a desired size range, such as 600 to 6.000 bp, are 
excised from the gel and treated with agarose. After chloroform extraction and dialysis on Microcon 100 columns, ONA 
in solution is adjusted to a 100 ng/pl conccntialioo. A ligation to a linearised, dephosphorylatcd, blunt-ended plasmid 
cloning vector is performed ovcmioht by adding 100 ng of BAC fragmented DNA to 20 ng of pDluGScript II Sk (+) vector 
ONA linearized by enzymatic digestion, and treating with olkalinc phosphatase. The ligation reaction is pcrf ormad in a 
1 0 /il final volume in tfu: presence of 40 units/;il T4 ONA figose (Epicentre]. The ligated products arc cicctroporatod into 
the oppropriate cells (ElcctroMAX £r(;/r*DH10B cells). IPTG and X-gal are added to the cell mixture, which is then 
spread on tlio surface of an ampictirm-contaimng ogar plate. After ovemioht incubation at 27''Q, rocombinant (white) 
colonies ore randomly picked and arrayed in 96 well microplatcs for storage and sequencing. 

Alternatively, BAC subcloning may be performed using vectors which possess both a high copy number origin 
of replication, which facilitates tha isolation of the vector DNA. and a low copy number origin of replication. Cloning of 
a genomic DNA fragment into the tiigh copy number ortgin of repOcation inactivates the ongin such that clones 
containing a genomic insert replicate at low copy number. The low copy number of clones having a genomic insert 
therein permrts the inserts to be stably maintained. In addition, selection proccduros may be designed which enable low 
copy number plasmids {Le. vectors having genomic inserts therein) to be selected. In a preferred embodiment, BAC 
subcloning will be perfonned in vectors having the above described features and moreover enabling high throughput 
sequencing of long fragments of genomic DNA. Such high throughput high quafrty sequencing may be obtained after 
generating successive deletions within the subcloned fragments to be sequenced, using transposition-based or enzymatic 
systems. Such vectors are described in tha U.S. Patent Application entitled *High Throughput DNA Sequencing Vector" 
IGENSET.015A, Serial No. 091058 J46), the disclosure of which is incorporated herein by reference. 

It will be appreciated that other subcloning methods familiar to those skilled in the art may also be employed. 

The resulting subclones ere then partially sequencad using, for example, tlic procedures doscribed below. 
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Example 4 
Partial snq unncina of 6AC sufadongi 
The genomic DNA mserts in the subclones, such as the BAC subdonns prepared above, are amplified by 
conducting PGR reactions on the overnight bacterial cullurcs, using primers complBmentary to vector sequences flanking 
the insertions. 

The sequences of the msurt extremife |nn average 5UU bases at each end, obtained under routine sequencing 
conditions) are determined by fluorescent automated sequancino on ABl 377 sequencers, using ARI Piism DNA 
Saquencinu Analysis software. Following gel imngc analysis and DNA sequence cjtlraction, sequence ibla are 
automatically processed with adequate software to osscss sequence quality. A proprietory basu caller. automatically 
flags suspect peaks, taking into account tlie shape of the peaks, the intcr-peaJt resolution, and the nnise level. The 
proprietary basc-caDcr also performs an automatic trimming. Any siretdi of 25 or fewer bases having more than A suspuct 
peaks is usually considered unrenable and is discarded. 

The sequenced regions of the subclones, such as the BAC subclones prepared above, are then analyzed in 
order to identify biallelic markers lying therein. The frequency at which biallolic markers will be detected in the 
screening process varies with the average level of heterozygosity dcsiretl. For example, if biallelic markers having an 
average heterozygosity rate of greater than 0.42 arc desired, they will occur every Z5 to 3 kb on average. Therefore, 
on average, six 500 bp-genomic fragments have to be screened in order to derive 1 biallelic marker having an adequate 
tnformative cement 

As a preferred alternative to sequencing the ends of an adequate number of BAC subclones, the above 
mentioned high throughput deletion-based sequencing vectors, which allow tha generation of a high quality sequence 
information covering fragments of ca. Bkb, may be used. Having sequence fragments longer than 2.5 or 3kb enhances 
tho chances of identifying biallelic markers therein. Methods of constructing and sequencing a nested set of deletions 
are disclosed in the U.S. Patent Application entitled "fligh Throughput DNA Sequencing Vector' {GENSET.015A, Serial 
No. OSI058746), the disclosure of which is incorporated herein by reference. 

To identify biallelic markers usmg partial sequence infonnation derived from subclone ends, 
such as the ends of the BAC subclones prepared above^ pairs of primers, each one specifically 
defining a 500 bp anjplificatiou fragment, are designed using the above mentioned partial sequences. 
The primers used for the genomic amplification of fragments derived from the subclones, such as 
the BAC subclones prepared above, may be designed using the OSP software (Hillier L. and Green 
P-, Methods Appl, 1:124-8 (1991), the disclosure of which is incorporated herein by reference). The 
GC content of the amplification primers preferably ranges between 10 and 75 %, more preferably 
between 35 and 60 %, and most preferably between 40 and 55 %. The length of amplification 
primers can range from 10 to 100 nucleotides, preferably from 10 to 50, 10 to 30 or more preferably 
10 to 20 nucleotides. Shorter primers tend to lack specificity for a target nucleic acid sequence and 
generally require cooler temperatures to form sufficiently stable hybrid complexes with tlie 
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templatc. Longer primeis are expensive to produce and can sometinies scIf-hybridizc to form hairpin 
structures. 

AH primers may contain, upstream of the specific tarocl bases, a common oliutJuuclcoiide tall that surves as a 
sequencing primer. ITiose skilled in the art arc familiar with primer extensions which can be used for these purposes. 

To identify biallelic markers, the sequences corresponding to the paitial sequences dclennined above arc 
determined and compared in a plurality of individuals. Tlw population used to identify biallelic markers having on 
ailoquate informative content preferably consists of ca, 100 unrelated individuals from a heterogeneous population. 

First, DNA is extracted from the puriphoral venous blood of each donor using methods such as thoso described 
in Example 5. 

Example 5 
Eiftraction of DMA 

30 ml of blood arc taken from the individuals in the presence of EDTA. Cells (pcllcl) arc collected after 
centrifugation for 10 minutes at 2000 rpm. Red cells ere lyscd by a lysis solution (50 ml final volume : 10 niM Tris 
pK7.6; 5 mM MqCIj; ID inM NaCI). The solution is ccntrifuged (10 minutes. 2000 rpm) as many times as necessary to 
eliminate the residual red cells present in the stipernatent< after rcsuspcnsion of the pallet in the lysis solution. 

The pellet of white cells is lysed ovornight at 42''C with 3.7 ml of lysis solution composed of: 
■ 3 ml TE 10-2 (Tris-HC1 10 mM, EDTA 2 mM) / NaCI 0.4 M 
•200/7lSDS10% 

' 500 Art K-proteinasQ (2 mg K-protcinasc in TE 1 0-2 / NaCI 0.4 M). 

For the extraction of proteins, 1 m! saturated NaCI (6M) (1/3.5 vfv) is added. After vigorous auitaiinn, the 
solution is ccntrifugcd for 20 minutes at 10000 rpm* 

For the precipitation of DNA, 2 to 3 volumes of 1 00% othanol are added to the previous supernatant, and the solution is 
centrifuged for 30 minutes at 2000 rpm. The DNA solution is rinsed three limes wiUi 70% eihanol to eliminate salts, 
and centrifuged for 20 minutes at 2000 qjm The pellet is dried at 37'C, and resuspendcd in 1 ml TE 10-1 or 1 ml 
water. The DNA concentration is evaluated by measuring the OD at 260 nm (1 unit 00 - 50//o/ml DNA). 

To evaluate the presence of proteins in the DNA solution, the OD 260 / OD 280 ratio Is determined. Only DNA 
preparations having a OD 260 f OD 280 ratio between 1.8 and 2 are used in the subsequent steps described below. 

Once genomic ONA Irom every individual in the given population has been ertracted, it is preferred that a 
fraction of each DNA sample is separated, after which a pool of DNA is constituted by assembling equivalent DNA 
amounts of the separated fractions into a single one. 

Second, the DNA obtained from peripheral blood as desaibed above is amplified using the above mentioned 
amplification primers. 

Example 6 provides procedures that may be used in the amplification reactions, and the detoction of 
polymorphisms within the obtained araplicons. 
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Eynmnie 6 

Am plificatinn of DNA ff nin Peripheral Blood 
and IdBntiticatmn of Biallnlin Markers 
The amplification of cadi sequence is performed on pooled DNA samples obtained as in Example 5 above, usinu 
pen (Polymerase Chain Reaction) as follows: 

final volume ^^^'^ 
ncnomicDMA 2ngi//l 

drfTP(cach) ^ 200 /yM 

primer (cadi) 2,9nu/pl 
Ampfi Taq Gold DNA polymerase (Pfirkin) 0.05 mxlu\ 

PGR buffer (1DX-D.1 M Tris HCI pH 9.3. 0.5 M KCI) IX. 

The synthesis of primers is performed following the pliosphoramiditc metiiod, on a 

GENSET UFPS 24.1 synthesizer. 

To reduce the expense of preparing amplification primers for use in the obovc procedures, short primers may be 
usel While primers and probes having between 15 and 20 (or more) nucleotides are usually higlily specific to a given 
nucleic acid sequence, it may be inconvenient and expensive lo synthesize a relatively long oligonucleotide for each 
analysis. In order to at least partially circumvent this problem, il is often possible to use smaller but still relatively 
specific oligonucleotides that are shorter in length to create a manaoeable library. For example, a library of 
oligonucleotides comprising about 8 to 10 nucleotides is conceivable and has already been used for sequencing ol a 
40,000 bp cosmid DNA (Sludior, Pwc. NatL Acad, Sci USA e6[18):69l7.C921 (1999), the disclosure of which is 
incorporated herein by reference). 

Another potential way to obtain specific primers and probes with a small library ol ofioonucleotides is to 
generate longer, more specific primers and probes from combinations of shorter, less specific oligonucleotides. Libraries 
of shorter oligonucleutidcs. each one being from about five to eight nucieotides in length, have already been used 
(Kieleczawa et el.. Science 258:1787-1791 (1992); Kotler et aL, Proc Nod Acdd. Sc/\ USA 90:42414245 (1993); 
Kaczorowski and Szybalski. Ana/Biadwm 221:127-135 (1934), the disclosures of which are incorporated herein by 
reference). Suitable probes and primers of appropriate length can therefore be designed through the association of two 
or three shorter oligonucleotides to constitute modular primers. The association between primers can be either covalent 
resulting from the activity of DNA T4 ligase or non-covalsnt through base-slacking energy. 

The amplification is performed on a Perkin Elmer 9600 Thermocycler or MJ Research PTC200 with heating lid. 
After heating at 95*C for 10 minutes, 40 cycles are performed. Each cycle comprises: 30 sec at SS^'C. 1 minute at 
54'*a and 30 sec at 72''C. For final elongation, 10 minutes at 72''C ends the amplification. 

The quantities of the ampfificalion products obtained are detemiined on 95-well microtUer plates, using a 
fluorimeter and Picogreen as intercalating agent (Molecular Probes). 
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The sequences of the ampntication products are dBtcrminod using automated dideoxy terminator sequencing 
rsactions with a dyD-primcr cycle sequencing protocol. The products of the sequencino reactions are run on sequencing 
gels and the sequences are determined using gel image analysis. 

The sequence data arc evaluated using software designed to detect the presence of biailelic sites among the 
S pooled amplified fraflments. The polymarphisra search is based on the presence of superimposed peaks in the 
electrophoresis pattern resulting from different bases Dccurring nl the same position. Because each dideoxy terminator 
is labeled with a different fluorescent molecule, tim two peaks corresponding to a biolleiic site present distinct colors 
corresponding to two different nucleotides at the same position on the scqnonce. The software evaluates the intensity 
ratio between the two peaks and the intensity ratio between a given peak and surrounding peaks of the same culor. 
10 However, the presence of two peaks can be en artifact duo to backaround noise. To exclude such on orlilact, 

the two DNA strands are sequenced and a comparison between lim peaks is carried out. In order to be registered as a 
polymorphic sequence, the polymorphism has to be detected on both strands. 

Tlie above procedure permits those amplification products which contain biallelic markers to be identified. 
The detection limit for the frequency of biallelic polymorpiiisms delected by sequencing pools of 100 
15 individuals is about 10% for the minor allele, as verified by sequencing pools of known allelic frequencies. However, 

more than 9t3% of the biallelic polymorphisms detected by the pooling method have a frequency for the minor allele 
higher than 2S%, Tlicrefore. the biallelic markers selected by this method have a frequency of at least 10% for the minor 
allele and 90% or less for the major allele, preferably at least 20% for the minor allele and 80% or less for the major 
allele, more preferably at least 30% for tha minor allele and 70% or less for the major allele, thus a hctcrozyonsity rate 
20 higher than 0.1 6, preferably higlicr than 0.32, more preferably higher than 0.42. 

In an initial study to determine the frequency of biallelic markers in the human genome that can be obtained 
using the above methods the following results were obtained. 300 different ampiicons dcnvcd from 100 individuals, and 
covering a total of 150 kb obtained from different genomic regions, were sequenced. A total of 54 bialleb'c 
polymorphisms were identified, indicating that there is one biallelic polymorphism with a heterozygosity rate higher than 
25 0.18 (frequency of the minor allele higher than 10%l preferably higher than 0.38 (frequency of the minor allele higher 

than 25%), every Z5 to 3 kb. Given that the human genome is about 3.10'^ kb long, this indicates that, out of the 10' 
biallelic markers present on the human genome, approximately 10^ have adequate heterozygosity rates for genetic 
mapping purposes. 

Using the procedures of Examples V6, sets containing increasing numbers of biallelic markers may be 
30 constructed. For example, the procedures of Examples 1-6 are used to identify 1 to about 50 biallelic markers. In some 
embodiments, the procedures of Examples 1*6 are used to identify about 50 to about 200 biallelic markers. In other 
embodiments, the procedures of Examples 1-6 are used to identify about 200 to about 500 biallelic markers. In some 
embodiments, the procedures of Examples 1-6 are used to identify about 1,000 biallelic markers. \n other embodiments, 
the procedures of Examples V6 are used to identify about 3X00 biallelic markers. In further embodiments, the 
35 procedures of Examples 1*6 are used to identify about 5,000 biallelic markers. In another embodiment, the procedures 
of Examples 1-6 are used to identify about 10,000 bialleRc markers. In still another embodiment, the procedures of 
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EiamplBS 1-B are used to idenlify about 20,000 biallelic markers. In still another emhodimont, the procedures of 
Examples VB are used to identify about 40,000 biallefic markers. In still another embodiment, tha procuduics of 
Exnmples 1-6 are used to identify about 60,000, biallelrc markers. In still another embodiment, the procedures of 
Examples 1-6 are used to identify about 80,000 biallelic markers. In a still onotiier embodiment, the pioncdures ol 
Examples 1-6 arc used to identify mora than 100,000 biallelic markers. In a turlhcr embodiment, the procedures of 
Examples 1-6 arc used to identify more than 120,000 biallelic markers. 

As discussed above, the ordered nudoic acids, sucli as the tnsens in BAG dunes, which contain the biallelic 
markers of the present invention may span a portion of the genome. Fur example, the ordered nucleic acids may span at 
least lOOkb of contiguous genomic DNA, at least 250kb of contiguous Dcnmnic DNA, at least GOOkb of contiguous 
genomic Df^A, at least 2Mb of cQntiguous genomic DNA, at least 5Mb of canti^juous genomic DNA, at least IDMb of 
contiguous genomic ONAr or at least 20Mb of caotiyuous genomic DNA, 

In addition, groups of biallelic markers located in proximity to one another along the genumc may be identified 
within these portions of the genome for use in haplotyping analyses as described below. The biallelic markers included 
in each of these groups may be located within a genomic region spanning icss than tkb, from 1 to 5fcb. from 5 to lUkb, 
from 10 to 25kb, from 25 to 50kb, from 50 to 150kb, from 150 to 250kb, from 250 to 500kb, from 500kb to 1Mb, or 
more than 1Mb. It will be appreciated that the ordered DIM fragme/its containing these groups of biallelic markers need not 
completely cover (he genomic regions of these lengths but may instead be incomplete contigs having one or more gaps 
therein. As discussed in furtlicr detail below, biaQcIic markers may be used in single maker and haplotypc association 
analyses regardless of the completeness of the corresponding physical contig harboring tficm. 

Using the procedures above, 653 biallelic markers, each having two alleles, were identified using sequences 
obtained from BACs which had been localized on the genome. In some cases, maikcrs were identified using pooled B AGs 
and thereafter reassigned to individual BACs using STS screening procedures sudi as those described in Examples 2 and 
7. The sequences of 50 of theso 653 biaDclic markers ere provided in the accompanying Sequence Listing as SEQ ID 
Nos. 1-50 and 51-100 (with SEQ ID Nos. 1-50 being one allele of these 50 biallelic markers and SEQ ID Nos. 51-100 
being the other allele of these 50 bialleOc markers). Although the sequences of SEQ ID Nos. 1-50 and 51-100 will be 
used as exemplary markers throughout the present application, it will bo appreciated that tha biallelic markers used in 
the maps of tha present invention are not limited to these particular markers, nor are they limited to having the exact 
flanking sequences surrounding the polymorphic bases which arc enumerated in SEQ ID Nos. 1-50 and 51-100 Rather, 
it wDI be appreciated that the flanking sequences sunounding the polymorphic bases of SEQ ID Nos. 1-50 and 51-100 
may be lengthened or shortened to any extent compatible with their intended use and the present invemion specifically 
contemplates such sequences. The sequences of these 653 biallelic markers, including the sequences of SEQ ID Nos. 1- 
50 and 51-100 may be used to construct the maps of the present invention as well as in the gene identification and 
diagnostic techniques described herein. It will be appreciated that the biallelic markers referred to herein may be of any 
length compatible with their intended use provided that the markers include the polymorphic base, and the present 
invention specifically cont^ptates such sequences. 
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Ordering of biallenn markers 
Biallclic markers can be ordered to determino their posilions along cbromosomes. preferably subchromosomal 
regions, most prtfcrabiy along the above described minimally overlapping ordered BAC arrays, as follows. 

The positions of the biallclic markers along chromosnincs may be determined using a variety nf nicthodolonics. 
In one approach, radiation hybrid mapping is used. Radiation hybrid (nil) mapping is a somatic cell ncnctir. approach that 
can be used for high resolution mapping of the human ocnoma. In tiiis approach, cell lines containing ona or mure human 
chromosomes arc lethaDy irradiated, breaking each chiomusoma into fragments whose size depends on the radiation dose. 
Tlicse fragmems arc rescued by fusion with cultured rodent cells, viufiHug subclones containing different portions of the 
human genome. This technique is described by Benhom ct al. {Genomics 4:509-517. 1989) and Coi et aL iScwncc 
250:245-250, 19901 the entire contents of which arc hereby incorporated' by reference. The random and independent 
nature of tlic subclones permits efrdent mapping of any human genome marker. Human DNA isolated from a panel uf 80- 
100 cell lines provides a mapping reagent for ordering biallolic markers. In this approach, the frequency of breakage 
between markers is used to measure distance, allowing construction of fine resolution maps as has been dune for ESTs 
(Schuler et al., Sa'cnce 274:510-545, 1996, hereby incorporatGd by reference). 

Rtl mapping has been used to gonerate a high-resolution whole genome radiation hybrid map of human 
clvomosome 17q22-q25,3 across the genes for growth hormone (GH) and thymidine kinase (TK) (Fusier et aL, Geaomks 
33:1fl5-19Z 1996), tlic region surrounding tl« Gorlin syndrome gene (Obcrmayr et aL, Eur. 1 Hum. Genet. 4:242-245, 
1996L BO loci covering the entire short arm of chromosome 12 (Raeymaekers et al., Gaiom/c^ 29:170-178. 1995), the 
region of human chromosome 22 containing the neurofibromatosis type 2 locus (Frazer et al., Genomics 14:574-504, 1992) 
and 1 3 loci on the long arm of chromosome 5 (Warrington et aL, Genomics 1 1:701 -708, 1991). 

Altcmatn/ciy, PGR based tBchniques and human-fodent somatic cell hybrids may be used to determine the 
positions of the bialleGc markers on the chromosomes. In such approaches, oligonucleotide primer pairs which are capable of 
generating amplification products containing the polymorphic bases of the biatlefic markers are designei Preferably, the 
oligonucleotide primers are 18-23 bp m length and ore designed for PCR amplification. The creation of PCR primers from 
known sequences is weH known to those with skill in the art For a review of PCR tcclwiology see Erfich, HA, PCR 
TcchnQlogv: Prirttrfgles and Applications for DH A Amplification . 1991 W.H. Freeman and Co., New York. 

The primers are used in polymerase chain reactions (PCR] to amprrfy templates from total human genomic DMA, 
PCR conditions are as follows: 60 ng of genonuc DNA is used as a tcmplaic for PCR with 80 ng of each oligonudcotide 
primer, 0.6 unit of Taq polymerase, and 1 fiCu of a ^^P-labclcd deorycytidine triphosphate. The PCR is performed in a 
micioplate themiocycler (Technc) under the following conditions: 30 cycles of 94°C, 1.4 min; 5B*C, 2 min: and 72*"^ 2 min; 
with a final ciiension at 72°C for 10 min. The ampnfied products are analyzed on a 6% polyacrylamide sequencing gel and 
visuaized by autoradiography. If the length of the resulting PCR product is identical to the length expected for an 
ampFification product containing the polymorphic base of the biallelic marker, than the PCR reaction is repeated whh DNA 
templates from two panels of human-mdent somatic tell hybrids, BIOS PCRable DNA (BIOS Corporation) and NI8MS 
Human-Rodent Somatic CeO Hybrid Mapping Panel Number 1 (NIGMS, Camden. NJ). 



wo 99/04038 



PCT/1B98/01193 



.27- 

PCn is used to screen a series of somatic tell hybrid ceil lines coniaining defined sets of human chromosomes for 
the presence of a given biallclic marker. DNA is isolated from the somatic hybrids and used as startino templates for PCH 
reactions using the primer pairs from the biadelic maiker. Only lliose somatic cell hybrids with chromosomes contninina the 
huJTwi sequence conesponding to !!« biallelic marker wil yield an amplified fragment. The biallclic markers arc assigncil to 
9 cliromosomc by analysis of the segregation pattern of PCR products from the somatic hybrid DNA templates. Tim single 
human chromosome present in all ceil hybrids that uivft rise to an ampOfiud fragment is the clirnmosome contoininu that 
bialfcfac marker. For a review of tccliniques end Analysis of results from somatic celt gene mapping eiperiments. {Sue 
ledbetlcr et al., Genomics 6:475-461 (1990).) 

Example 7 describes a preferred method for positioning of biallelic markers on clones, such as BAC clones, 
obtained from genomic DNA libraries. 

Example 7 

Screeninn F^AC librafir?3 with biallelic markers 
Amplification primers enabling the specific amplilication of DNA fragmems carrying the biallrfic markers (including 
the 653 biallelic markers obtained above (which include the sequences of SEQ ID Nos 1-50 and 5MD0) may be used to 
screen clones in any genomic DNA library, preferably the BAC librziries described above for the presence of the biallelic 
markers. 

Pairs of primers were designed which allowed tim amplification of fragments carrying the 653 hiallcnc markers 
obtained above. The amplification primers may be used to screen doncs in a genomic DNA library for the presence of the 
G53 biallelic markers. For example, pairs of amplification primers of SEQ ID Nos. 101150 and 151-200 may be used to 
amplify fragments which include the polymorphic bases of the biallelic markers of SEO ID Nos. 1-50 and 5M00. 

It will be appredated thet amplification primers for tho biallelic markers may be any sequences which allow the 
specific amplincation of any DNA fragment carrying tlie markers and may be designed using techniques familiar to those 
skilled in the art. The amplification primers may be oligonucleotides of 8, 10, 15, ZD or more bases in length which 
enable the amplification of any fragment carrying the polymorphic site in the markers. The polymorphic base may be hi 
25 the center of the amplification product or, altcfnathrcly, it may be located off-center. For example, in some 

embodiments, the amplification product produced using these primers may be at least 100 bases in length ftc. 50 
nucleotides on each side of the polymorphic base in ampOftcation products in which the polymorphic base is centrally 
located). In other embodiments, the amplification product produced using these primers may be at least 500 bases in 
length (i.c. 250 nuclaotidas on each side of the polymorphic base in amplification products in which the polymorphic base 
30 is centrally located). In stiO further embodiments, the amplification product produced using these primers may be at 

teast 1000 bases in length M. 500 nucleotides on each side of the polymorphic base in amplification products in which 
the polymorphic base is centrally located). Amprification primers such as those described ebovc are included within the 
scope of the present invention. 

The locafization of bialleru: markers on BAC clones is perfomied essentially as described in Example 2. 
35 The BAC clones to be screened are distributed in three dimensional pools as described in Example 2. 
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Amplification reactions arc conducted on the pooled BAC clones using primers specific for ihs faiDllelic markers 
to identify DAC clones which contain the hiailolic markers, usino procedures essentially similar to those described in 
Example 2. 

Amplincalion products resulting Irom ths ampnfication reactions are delected by conventional agarose gel 
5 electrophoresis combined with automatic image capturinj and processing. PCH screenino for a biallolic marker involves 

llirec steps; (1) identifyino the positive priniary pools; 12) for each positive prbnaiY pools, idcnlilying the positive piaie. 
row and column 'subpools' to obtain the address of the positive clone; (3) directly confirming the PCH assay on the 
identified clone. PCR nssa/s are performed with primers dcfininfl the biallelic marker. 

Screening is conducted as follows. First BAC DNA is isolated as lullows. Bacteria conlaitung the genomic 
10 insens are grown overnight at 37'*C in 120 pi of LB contaiiung chloramplicnicul (12 MQ/ml). DNA is extracted by the 

following protocol: 

Ccntrif UQ0 1 0 min at A'^C and 2000 rpm 

Eliminate supernatant and resuspcnd pellet in 120 ;/l TE 10-2 fTris II CI 10 mM, EDTA 2 mM) 
Centrifuge 10 nun at 4''C and 2000 rpm 
15 Eliminate supernatant and incubalc pellet with 20 pi lyzozymc 1 mg/ml during 1 5 min at room temperature 

Add 20 //f proteinase K l00//g/ml and incubate 15 min at 60°C 
Add 8 jj\ DNAsc 2U(pl and incubate 1 hr at room temperature 
Add 100 //I TE 10-2 and keep at -80°C 



20 PCR assays are performed using the f oilowing protocol: 

Final volume 15 //I 

BACONA 1.7ng!;/l 

MgCl2 

dNTPfeach) 200 
25 primer (each) 2.9 ng/A/l 

AmpR Taq Gold DNA polymrasc 0.D5 unit|//l 

PCR buffer |10x - 0.1 M TrisHCl pH8.3 0.5M KCI h 



The amplification is pcriormed on a Genius II thermocyder. After heating at B^^Z for 10 min, 40 cycles arc 
30 performei Each cycle comprisos: 30 sec at 95''C, S4**C for 1 min, and 30 sec at 72°C, For final elongatian, 1 0 min at 
72'*C end the amplincation. PCR products are analysed on 1% agarose gel with 0.1 mg/ml ethidium bromide. 

Using such procedures, □ number of BAC clones carrying selected bialldtc markers can be isolated. The 
position of these BAC clones on the human genome can be defined by perf omiing STS screening as described in Example 
2. Preferably, to decrease the number of STSs to be tested, each BAC can be localized on chromasamal or 
35 subchromosomal regions hy procedures such as those described in Examples 8 and 9 below. This localization wfll allow 
the selection of a subset of STSs corresponding to the identified chromosomal or suhchromosomal region. Testing each 
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BAC with such a subset of STSs and taking account of the position and order of tl»c STSs along the QEinoma will allow a 
refined positioning of the corresponding biallelic marker along the oenome. 

In oiticf embodiments, if the DMA library used to isolate BAC inserts or any type of genomic DNA fragments 
horbonng the selected biallulic markers already constitulc a physical map of UiO genome or any portion thereof, using the 
known order of the DMA fragments wfll allow the order of the biallelic markers to be established. 

As discussed above, rt will be appreciated that markers carried by the same fragment of genomic ONA, such as 
the insert in a BAC clone, need not neccssonly be ordered with respect lo one another within tlic genomic iraumcnt to 
conduct single point or haplotype assnciation analysis. However, in olhcf embodiments of tliu present maps, the urdor of 
biallelic markers carried by the same fragment of ocnomic DMA may be detcnnined. 

The positions of the biallelic markers used to construct the maps of the present invention, including the 653 
biallelic markers obtained above, may be assigned to subchromosomal locations using Fluuresccnce In Situ Hybridization 
{FISH] (Cherif ct aL, Pftc. NatL Acad, ScL USA, 87:G639-6M3 (1990), the disclosure of which is incorporated herein by 
reference). RSH analysis is described in Exampb 6 below. 

Exnmnte 8 

Assiqnmnnt of Biallelic Markers to Snbctirnrnnsnmal Renions 
Metaphase chromosomes ore prepared from phytohcmaggluiinin {PHAI-strmoIated blood ceQ donors. PIIA- 
stimulated lymphocytes from licalthy males arc cultured for 72 h in RPMI-1640 medium. For synchronization, mctholreiatfi 
110 |iM) is added for 17 h, foliovi/ed by addition of B-bromodeoiyuridine (5-BudR, 0.1 mM) for 6 b. Cnlccmid (1 |.igfmD is 
20 added for the last 15 min before harvesting the cells. Cells are coflccled, vuashcd in RPMI, incubated with a hypotonic 

sohJtion of KCl (75 mM) at 37*='C for 15 min and fuccd in tliree changes of mclhanokacctic acid (3:1). The cell suspension is 
dropped onto a glass slide ond air-diicd. 

BAG clones carrying the biallelic markers used to construct the maps of the present Inucniian (including the 653 
biaHeTrc markers obtained aboveto} can be isolated as described above. Those BACs or portions thereof, inducling fragments 
25 carrying said biallelic markers, obtained for example from amplincation reactions using pairs of amplification primers as 
descrBiod above, can be used as probes to be hybridized with metaphasic chromosomes. It will be appreciated that the 
hybridization probes to be used in the contemplated method may be generated using alternative methods well known to 
those skilled in the art Hybridization probes may have any length suitable for this intended purpose. 

Probes are then labeled with biotm-l6 dUTP by nick translation according to the manufacturer's instructions 
30 (Bethesda Research laboratories, Bethesda. MD), purified using a Sephadex G EO column (Pharmacia. Upssala, Sweden) and 
precip'rtatcd. Just prior to hybridization, the DNA pelldt is dissolved in hybridiaation buffer [50% formamide, 2 X SSC, 10% 
dextran sulfate, 1 mglml sonicated salmon sperm DNA, pH 7) and the probe is denatured at l^C for 5-1 0 min. 

Slides kept at •20''C are treated for 1 h at 37°C with RNase A 1100 figfml), rinsed three times in 2 X SSC and 
dehydrated in an ethane! series. Chromosome preparations arc denatured in 70% fomiamide. 2 X SSC for 2 min al 70°C. 
35 then dehydrated at VZ> The sOdes are treated with proteinase K (10 ^gflOO ml in 20 mM Tris-HCl, 2 mM CaClj) at V% 



\ 

wo 99/04038 PCT/IB98/01 193 

■30- 

for 8 min and dehydrated. Tlie hybridization mixture containino the probe is placed on lha slide, covered with a covcrslip, 
sealed with rubber ccmow and incubatod overniotit in a humid chamber at 27%. AttEr hybridiiation and posMiybiidlzation 
washes, tlie biotinylatcd prohe is detected by avidin-FITC and amplified with additional layers of biotinylatcd goat anti avidin 
and avidirv-FITC. For chromosumal locoRzation, fluorescent B-bands are obtained as previously described {Chcrif et al.,(1990) 
5upf^,l The slides are observed under a LEICA fluorescence microscope (DMF^XA). Chromosomes are counlcrstaincd with 
propidium iodide and the fluorescem signal of the probe appears as two symmetrical ycUnw-giecn spots on bolli claomatids 
of tliE fluorescent R-band chromosome (red). Thus, a particular liinllclic marker may be locan^ed to a particular cytogenetic 
R-band on a givan chromosome. 

TliG above procedure was used to confimi the siichromosomol locnilun of 95% of the BAC clones harboring tiic 
653 markers obtained above* In particular, the 50 markers of SEQ ID Nos. 1-50 and 5M00 were assigned to 
subchfomosomal regions of chromosome 21. Simple identification numbers were attn*butfld to each BAC from which the 
markers are derived. Frgure 1 is a cytogenetic map of chromosome 21 imfical'uig Uic subchromosomal regions therein. Table 
1 Dsts the intemal identification number of the localized biallclic markers, the internal identification number of the BACs Umw 
which the markers were derived, the size of the BAC nscrt, the average intcrmarkcr distance in iho BAC insert and the 
suhchromosomal locations of the liiallelic markers. Tl»c sequences of the tocalized markers are provided as SEQ ID Nos. 1-50 
and 5M00 in the accompanying sequence fisting. ArapDfication primers for generating ampiificatiDn products containing 
ttje polymorphic bases of these markers are also provided as SEQ 10 Kos. 101-150 and 151-200 in tim accompanying 
sequence fisting. Microscqucncing primers for use in determining the identities of the polymorphic bases of these biallelic 
markers arc provided in the accompanying Sequence listing as SEQ ID Nos. 201-250 and 251-300. 

The rale at which biallelic markcis may be assigned to subchromosomal regions may be enhanced through 
automation. For example, probe preparation may be performed in a microtitcr plate format, using adequate robots. The rate 
at which biaflefic marksis may be assigned to subchromosomal regions may be cnlianccd using techniques which permit the 
iff s/rx^ hybridization of multiple probes on a single microscope slide, such es those disclosed in Larin ct aU Nucleic Acids 
Research 22: 3689-3692 [1S94h the disclosure of which is incorporated herein by reference, in the largest test fonnat 
descried, different probes were hybridiied simultaneously by applying them directly from a 96-wcll microtiter dish which 
was inverted on a glass plate. Software for image data aquisition and analysis that is adapted to each optical system, test 
format, and fluorescent probe used, can be d^ed from the system described in Uchter et al. Science 247: 64-69 (1990), 
the disclosure of which ts incorporated herein by reference. Such software measures the relative distance between the 
center of the fluorescent spot corresponding to the hybridized probe and the tetomeric end of the short arm of the 
corresponding chromosome, as compared to the total length of the chromosome. The rate at which biallclic markers are 
assigned to suhchromosomal locations may be further enhanced by simultaneously applying pmbes labeled with different 
flouorescent tags to each well of the 96 weQ dish. A further benefit of conducting the analysis on one slide is that h 
facilitates airtomation, since a microscQpe having a moving stage and the capability of detecting fluorescent signals in 
different metaphass chromosomes could provide the coordenatcs of each probe on the metaphasa chromosomes distributed 
on the 96 wefl dish, 

Eiample 9 below describes an allemative method to position biollelic markers which allows their assignment to 
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Example 9 

Assignment of Rialieiic Markers to Human Chroinnsomes 

The bialtelic markers used to construct the maps of the present invention, including the 653 biollolic markers 
obtained above (which includa the sequences of StQ ID Nos. 1-50 and 5M00L may be assigned to a human 
chfomosoma using monosomal analysis as described baJow. 

Tho chromosomal localization of a biaHcIic marker can be performed through the use of somatic cell hybrid 
panels. For example 24 panels, each panel containing a different human chromosome, may be uscJ (Russell et al., 
Somut Cell Mot Genet 22:425431 (1996); Drwinga ut al, Genomics 16:311-314 (1993), the disclosures of which are 
incorporated herein by reference). 

The bialtelic markers are localized as follows. The DNA of each somatic cell hybrid is extracted and purified. 
Genomic DNA samples from a somatic cell hybrid panel arc picparcd as follows. Cells are lysed overnight at 42*^0 with 
3.7 ml of lysis solution composed of: 

3 ml TE 10-2 (Tris HCI 10 mM, EDTA 2 mM) / NaCI 0.4 M 

200/7lSDS10% 

500/4 K proteinase (2 mg K-proteinase in TE 10-2 / NaCI 0.4 M) 

For the extraction of proteins, 1 ml saturated NaCI (6M) (1/3.5 v/v) is added After viuorous agitation, the 
solution is ccntrifuged for 20 min at 10,000 rpm. For lha precipitation of DNA, 2 to 3 volumes of 100 % ethonol are 
added to the previous supernatant and the solution is ccntrifuged for 30 min at 2,000 rpm. The DNA solution is rinsed 
three limes with 70 % ethanol to eliminate salts, and centrifuQed for 20 min at 2,000 rpm. The pellet is dried ot Zl^'Z, 
and resuspended in 1 ml TE 10-1 or 1 ml water. The DNA concentration is evaluated by measuring the OD at 260 nm (1 
unit DO - 50 /yg/ml DNA), To determine the presence of proteins in the DNA solution, the OOjgo/ODjao ratio is 
determined. Only DNA preparations havinfl a OD250/OD280 ratio between 1.8 and 2 are used in the PGR assay. 

Then, a PGR assay is performed on ocnomic DNA with primers defining the biallelic marker. The PGR assay is 
perfornied as described above for BAG screening. The PCR products arc analyzed on a 1% agarose gel containing 0.2 
mg/ml ethidium bronu'de. 

The ordering analyses described above may be conducted to generate an integrated genome wide genetic map 
comprising about 20,000 biafielic markers (1 biallelic marker per BAG if 20,000 BAC inserts are screened). In some 
embodiments, the map includes one or more of the 653 markers obtained above (which include the sequences of SEQ ID 
Nos. 1-50 and 51-100 or the sequences complementary thereto). 

In another enAodirrrent, tha above procedures are conducted to generate a map comprising about 40,000 
markers (an average of 2 biallelic markers per BAG if 20.000 BAC irwcrts are screened). In some embodiments, the map 
includes one or more of the 653 markers obtained above (which includa the sequences of SEQ ID Nos. 1-50 and 5M0O 
or the sequences complementary thereto). 

In a further embodiment preferred embodiment, the above procedures are conducted to generate a map 
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compristng about 60.000 markers I an average of 3 biallelic markers per BAC if 20,000 BAC inserts arc screened), (n 
some flmbotfiments, the map includes one or more of the 653 markers obtained above (which include the sequences of 
SEQ ID Nos. 1-50 and 5M00 or the sequences complementary thereto). 

In a further embodiment prefened embodiment, the above procedures are conducted to gcnerato a map 
comprising about 80,000 markers (an average of 4 hiailolrc markers per DAC if 20,000 BAC inserts are screened). In 
some embodiments, the map includes one or moru of the 653 markers oblained above (which rnclude the sequences of 
SEQ ID Nos. 1-50 and 5M00 or tho sequences complementary thereto). 

In yet anotlier embodiment, the above procedures are conducted to generate a map comprising about 100.000 
markers {an average of 5 biallelic markers per BAC if 20,000 BAC Inserts arc screened). In some embodiments, the map 
includes one or more of the 653 markers obtaincd'above (which include the sequences of SEQ ID Nos. 1-50 and b1 100 
or the sequences compleiiKntary thereto). 

In a further embodiment, the above procedures are conducted to generate a map comprising about 120,000 
markers (an average of 6 biallelic markers per BAC if 20,000 DAC inserts arc screened). In some cmbodimems, the map 
includes one or more of tite G53 markers obtained above (wlu'ch include the sequences of SEQ ID Nos. 1-50 and 5M00 
or the sequences complementary thereto. 

Alternatively, maps having the above-specified average numbers of biallelic markers per BAC which comprise 
smaller portions of the genome, such as a sat of chromosomes, a single chromosome, a particular subchromosomal 
region, or any other desired portion of the genome, may also be constructed using the procedures provided herein. 

In some embodiments, tho biallelic markers in t))C map are separated from one another by an average distance 
of 1O-200kb. In further embodiments, the biallelic markers in the map arc separated from one another by an average 
distance of 15-150kb. In yet another embodiment, the biallelic markers in the map are separated from one another by an 
average distance of 20-1 OQkb. In other embodiments, the biallelic markers in the map are separated Irom one another 
by an average distance of 100*150kb. In further embodiments, the biallelic markers in the map are separated from one 
another by en average distance of 50-1 OOkb. In yet another embodiment, the bianefic markers in the map are separated 
from one another by an average distance of 25-50kb. Maps havmg the above-specified intermarker distances which 
comprise smaller portions of the genome, such as a set of chromosomes, a single chromosome, a particular 
subchromosomal region, or any other desired portion of the genome, may also be constructed using the procedures 
provided herein. 

Figure 2, showing the results of computer simulations oi the distribution of inter-marker spacing on a randomly 
distributed set of biallcfic markers, indicates the percentage of biaDelic markers which will be spaced a given distance 
apart lor a given number of markcrs/BAC in the genomic map (assuming 20,000 BACs constituting a minimally 
overalapping array covering the entire genome are evaluated). One hundred iterations were performed for each 
simulation (20,000 marker map, 40,000 marker map, 60,000 marker map, 120,000 marker map). 

As iliustrated in Rgure 2a, 58% of inter-marker distances will be lower than ISOkb provided 60,000 evenly 
distributed markers are generated (3 par BAC); 90% of mtor-marker distances will be lower than 150kb provided 40,000 
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evenly distributed markers are generated (2 per BACI; and 50% of intef-markur distances will b8 lower llian 150kb 
provided 20,000 avsnly distributed markers are genoraicd II per BAC). 

As illustrated in Figure 2b, 98% of inter-markcr distances will be lower than 80kb provided 120,000 evenly 
distributed markers are generated (6 per BAC); 8U% of inter-marker distances will be lower than 80kb provided 60,000 
evenly distribirled markers arc generated (3 per BAC); and 15% of intcnnarkor distances will be lower than BDkb 
provided 20,000 evenly distributed markers are generated (1 per BAC). 

As already mentioned, high density biatlclic marker maps allow association studios to be performed to identify 
genes involved in complex traits. 

Association studies examine the frequency of marker alleles in unrelated trait positive (T+) individuals 
compared with trait negative {T-) controls, end are genarally employed in the'detection of polygenic inheritance. * 

Association studies as a method of mapping genetic trails rely on the phenomenon of linkage disequilibrium, 
which is described below. 

Linkage Discnuilibritim 

If two genetic loci lie on the same chromosome, then sets of alleles on the same chromosomal segment (called 
haplotypesi tend to be transmitted as a block from generation to generation. When not broken up by iccombination, 
haplolypes can be tracked not only through pedigrees but also through populations. The resulting phenomenon at the 
population level is that the occurrence of pairs of specific alleles at different loci on the same chromosome is not 
random, and the deviation from random is called linkage disequilibrium (LD). 

If a specific allele in a given gene is directly involved in causing a panicular trait L its frequency will be 
statistically increased in a T+ population when compared to the frequency in a T- population. As a consequence of the 
existence of LD, the frequency of all other alleles present in the haplotypc carrying the trait-causing allele (TC A) will also 
be increased in T+ indhfiduals compared to T- individuals. Therefore, association between the trait and any allele in 
linkage diaaquilibrium with the trait-causing aHele will suffice to suggest the presence of a trait-related gene in that 
particular allele's region. Linkage disequilibrium allows the relative frequencies in T^ and T- populations of a limited 
number of genetic polymorphisms (specifically blallelic markers) to be analyzed as an alternative to screening all possible 
functional polymorphisms in order to find trait-causing alleles. 

The present invention then also concerns biallelic markers in linkage disequilibrium with the specific bialldic 
markers described above and which are expected to present similar characteristics in terms of their respective 
association with a given trait. In a preferred embodiment, the present invention concerns the biallelic markers that ore in 
linkage disequilibrium with the 653 biallelic markers obtained above (which include the sequences of SEQ ID Nos. V50 
and 5M00 or the sequences complementary thereto). 

LD among a set of biallelic markers having an adequate heterozygosity rate can be determined by genoiypino 
between 50 and 1000 unrelated Individuals, preferably between 75 and 200, more preferably around 100. Gcnotyping a 
biallelic marker consists of determining the specific allele carried by an individual at the given polymorphic base of the 
biallelic marker. Genotyping can be performed using similar methods as those described above for the generation of the 
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biallalic markers, or using other genotyping methods such as those f urthor described below. 

LD between any pair of biallclic markers comprisitiQ at least one of the biollefic markers of the present 
invention (Mj,f^|} can bo calculated for every allele combination (M|i,M|i; Mj^Mj?; M|2,Mji andM,2,M|2L according to the 
Piazza formula : 

AfA^k^r VG4 W {94 + 03) (04 +02) , where : 

04 frequency of genotypes not having allele k at and not having allele I at M, 

03- - + - frequency of genotypes not having allele k et M| and having allcfe I at Mj 
62» + ■ - frequency of genotypes having allele k al and not having allele I at M| 

Linkage disequilibrium (LO) between pairs of biallclic markers |Mi. Mj) can also be calculated for every allele 
combination (Mil,H/ljl ; Mi1.Mj2 ; Mi2,Mj1 ; Mi2,Mj2) according to the maximum likelihood estimate (MLE) for delta |lhe 
cumposite linkage disequUibrium coefficient), as described by Weir (B.S. Weir, Gcnotic Daia Analysis, (199G), Sinauer 
Ass. Eds, the disclosure of which is incorporated herein by reference). This formula allows linkage disequilibrium 
between alleles to be estimated when only genotype, and not haplotype, data are available. This LD composite test 
makes no assumption for random mating in the sampled population, and thus seems to be more appropriate than other 
LD tests forgenotypic data. 

The skilled person win readily appreciate that other LD calculation methods can be used witJiaut undue 
experimentation. 

Example 10 illustrales the measurement of LD between a publicly known biallclic marker, the 'ApoE Site A\ 
located wHhin the Alzheimer's related ApoE gene, and other biallclic markers randomly derived from the genomic region 
containing the ApoE gene. 

Example ID 
Measurement of Linkage Disenuilibrium 

As originally reported by Strittmattor et aL and by Saunders et aL in 1993. the Apo E e4 allele is strongly 
associated with both late-onset familial and sporadic Alzheimer's disease (AD). (Saunders, A.M. Lancet 342: 710711 
(1993) and Strittmater, WJ. et aL Proc. NatL Acad. 5ci. U.S.A. 90: 197M981 (1993). the disclosures of which arc 
incorporated herein by referertte). The 3 major isoforms of human Apolipoprotein E {apoE2, -£3, and -EA), as identified by 
isoelectric focusing, arc codad for by 3 alleles (e 2, 3, and 4). The e 2, e 3, and e 4 isoforms dilfer in amino acid 
sequence at 2 sites, residue 112 (called site A) and residue 158 (called site B). The ancestral isoform of the protein is 
Apo E3, which at sites A/B contains cystcine/arginine, while ApoE2 and -£4 contain cysteine/cysteine and 
arginine/erginine, respectively {Weisgraber, K.H. et al., J, Biol. Chem. 25B: 9077-9083 11981); Rail, SX. et aL, Proc, 
NaiL Acad. Sci. U.S.A. 79: 469B4700 (19B2), the disclosures of which are incorporated herein by reference), 

Apo E e 4 is currently considered as a major susceptibility risk factor for AD development in individuals of 
diff^ent ethnic groups (speciafly in Caucasians and Japanese compared to Hispanics or African Americans), across all 
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ages betwesn 40 and 90 years, and in both men and women, as reported recently in a study performed on 5930 AD 
patients and 8607 controls (Farrer et aL JAMA 278:1349.1356 (1997h the disclosure of which is incorporated herein 
by raferencc). More specifically, the frequency of a C base coding for arginine 1 12 at site A is significantly increased in 
AD patients. 

Although the mechanistic link between Apo E e 4 and neuronal dcfleneration characteristic of AD remains to be 
establislicd, current hypotheses jugocst that the Apo E genotype may influence neuronal vulnerability by increasing the 
deposition and/or aggreootion of the amyloid beta peptide in the brain or by indirectly reduciiig cJiurgy availability to 
neurons by promoting athernscterasis. 

Using the methods of the present invention, hiallclic markers that are in tho vicinity of the Apo E site A were 
generated and the association of one of their alleles with Alzheimer's disease was analyzed. An Apo E public marke? 
(stSG94) was used to screen a human genome DAC library as previously described. A BAC, whidt gave a unique FISH 
hybridization signal on chromosomal region 19q 13.2.3. the chromosomal region harboring the Apo E gene, was selected 
for finding biallelic markers in linkage disequilibrium with the Apo E gene as follows. 

This BAC contained an insert of 205 kb that was subcloncd as previously described. Fifty BAC subclones were 
randomly selected and sequenced. Twenty five subclone sequences were selected and used to design twenty five pairs 
of PCR primers allowing 500 bp-ampHcons to be generated. These PCR primers were then used to amplify the 
corresponding genomic sequences in a pool of DNA from 100 unrelated individuals (blood donors of French origin) as 
already described. 

Amplification products from pooled DNA were sequenced and analyzed for the presence of hiallclic 
polymorphisms, as already described. Five amplicons were shown to contain a polymorphic base in the pool of 100 
unrelated individuals, and therefore these polymorphisms were selected as random biallelic markers in the vicinity of tiie 
Apo E gene. The sequences of both alleles of these biailDlic markers (99-344/439; 99-355/219; 99-359/308 ; 99* 
365/344 ; 99-368/274) correspond to SEQ ID Nos: 301-305 and 307-31 1 (See the accompanying Sequence Usting and 
TablelO) , Corresponding pairs of amplification primers for generating ampScons containing these biallelic markers can 
be chosen from those listed as SEQ ID Nos: 313-317 and 319-323. 

An additional pair of pimcrs (SEQ ID Nos: 318 and 324) was designed that allows amplification of the 
genomic fragment carrying the biallelic polymorphism corresponding to the ApoE marker (99-2452/54; C{T; The C allele 
is designated SEQ 10 NO: 308 in the accompanying sequence listing, while llie T allele is designated SEQ ID NO: 312 in 
the accompanying Sequence listing; (See also Table 10), publicly known as Apo £ site A (Weisgrabcr et al. (19811 
sapm: flail et al. (1982), suprd] to be BmpBfied. 

The five random biallelic markers plus the Apo E site A tnarker were physically ordered by PCR screening of the 
corresponding amplicons using all available BACs originaOy selected from the genomic DNA libraries, as previously 
described, using the public Apo E marker stSG94, The amplicon's order derived from this BAC screening is as follows: 

(99-344/99.366) - (99.365/99-2452) - 99-359 - 99-355, 
where brackets indicate that the exact order of the respective amplicons couldn't be established. 

Linkage disequilibrium among the six biallelic markers (five random markers plus the Apo E site A) was 
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detarmined by genotyping the samo 100 unrelated individuals from whom (he random biallciic markers were identifiad. 

DMA samples and amplification products from QDnomic PCR were obtoined in simifor conditions as those 
described above for the generatbn of biallelic markers, and subjected la automated microscquencing reactions using 
fluorescent ddNTPs Ispscific fluorescence for each ddNTP) and the appropriate microsequencing primers having a 3' end 
5 rmmediataly upstream of the polymorphic base in the biaiielic markers. The sequence of these microsoqucncino primers is 

indicated within the corresponding ser|uencc listings of SCO ID Nos: 325-330. Once spBcificaily csctcndcd at tha 3' end 
by a DNA polymerase using the complementary fluorescent didcoxynuclaotide analog Ithermal cycling), the 
mtaosequoncing primer was precipitated to remove (he unincorporated fluorescent ddNTPs. The reaction products were 
analyzed by electrophoresis on AOI 377 scqusncing machines. Results were automalically analyzed by an appropriate 

10 software further described in Example 1 3. 

Linkage disequilibrium |LD) batwean all pairs of biallelic markers (Mi. Mjl was calculated for every allele 
combination (Mil.Mjl ; Mi1,Mj2 ; Mi2,Mjl ; MiiMjZ) according to \\\c maximum likelihood estimate {MLE) for delta (tha 
composite liakage disequilibrium coefficiani). The results of tha LO analysis between the Apo £ Site A marker and the 
five new biaPclic markers 199.344/439 ; 99-355/21 9 ; 99-359/308 ; 99-3G5/344 ; 99.366/274) are summarized in Table 

15 2 below: 



Table 2 

Markers dxlOO SEQ ID Nos of tha SEQ ID Nos of the 

bialtotic Markers amplification Primers 





ApoE SitoA 


306 


318 




99-2452/54 


312 


324 


99-344/439 


1 


301 


313 






307 


319 


99-366/274 


1 


305 


317 






3U 


323 


99-365/344 


8 


304 


316 






310 


322 


99-359/308 


2 


303 


315 






309 


321 


99-355/219 


1 


302 


3H 






308 


320 



35 



The above LD results indicate that among the five biallelic markers randomly selected in a region of about 200 
kb containing the Apo E gene, marker 99-365/344T is in relatively strong linkage disequilibrium with the Apo E site A 
allele (99-2452/540. 
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Therefore, since the Apo E site A allele is associated with Alzheimer'^ disease, one can predict that the T allele 
of marker S9-355/344 wiU probably be found associated with AD. In order to test this hypothesis, the biallelic markers 
of SEQ 10 Nos ; 301-306 and 307^312 were used in association studies as described below. 

225 Alzheimer's disease patients were recruited according to clinical inclusion criteria based on the MMSE 
test. The 248 control cases inchjded in this study were both ethnically- and ii(je rnatched to the affected cnsns. Both 
affected and control individuals corresponded to unrcloted cases. The identities of the polymorphic bases of each of the 
biallelic markers was determined in each of these individuals using the methods described above. Techniques for 
conducting association studies are further described below. 

The resuhs of this study are summarized in Table 3 below : 



Tablo 3 



MARKEI) 



ASSOCIATION DATA 



15 



20 



Difference in allele frequency 
between individuals with Alzheimer's 
and control individuals 



99-344/439 
99-366/274 
99-365/344 
S9-245Z/54 (ApoE Site A) 
99-359/308 
99-355/219 



3,3% 
1.6% 
17.7% 
23.8% 
0.4% 
2.5% 



Corresponding p-valus 



9.54 E-02 
2.09 EOI 
6.9 E-10 
3.95 E-21 
9.2 E-Ol 
2.54 E-Ol 



25 The frequency of the Apo E site A allBle in both AD casos and controls was found in agreement with that 

previously reported (ca. 10% in controls and ca. 34% in AD cases, leading to a 24% difference in allele frequency), thus 
validating the Apo E e4 association in the populations used for this study. 

Moreover, as predicted from the LO analysis (Table 2), a significant association of the T allele of marker 99- 
365/344 with AD cases (16% increase in the T allele frequency in AD cases compared to controls, p value for this 
30 difference - 6.9 E-10) was observed. 

The above results indicate that any marker in LD with one given marker associated with a trait will be 
associated with the trait It will be appreciated that, though in this case the ApoE Site A marker is the trait-causing 
allele (TCA) itself, the same conclusion could be drawn with any other non TCA marker associated with the studied trait. 
These results further indicate that conducting association studies with a set of biallelic markers randomly 
35 generated within a candidate region at a sufficient density (here about one bialleOc marker every 40kb on average). 
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allows the identification of 8t least one marker associated with the trait 

In addition, these results correlate with the physical order of the six biallelic markers contemplated within the 
present example (see above) : marker 99-385/344, which had been found to be the closest in terms of physical dislanco 
10 the ApoE Site A marker, also shows the strongest LD with the Apo E site A marker. 

In order to further refine the relationship between physical distance and linkaQc disequilibrium between biallolic 
markers, a ca. 450 kb fragment from a oenomic region on chromosome 8 was fully sequenced. 

LO within co. 230 pairs of biallelic markers derived therefrom was measured in a random French population 
and analyzed as a function of the known physical inicr marker spacing. This analysis confirmnd that, on avnrnoe, LO 
between 2 biallelic markers corrolatcs with the physical distance that separates them. It further indicated that LD 
between 2 biallelic markers tends to decrease when their spacing ihcreascs. More particularly, LD betvveen 2 biallelic 
markers tends to decrease when tlicir inter-marlcer distance is greater than 5Qkb, and is further decreased when the 
inier markcr distance is greater than 75kh. It was further observed that when 2 biallulic markers were further than 
150kb apart most often no significant LD between them could be evidenced. II will be appreciated that the size and 
history of the sample population used to measure LO between markers may influence the distance beyond which LD 
tends not to be detectable. 

Assuming that LD can be measured between markers spanning regions up to an average of IBOkb long, biallelic 
marker maps will allow genome-wide LD mapping, provided they have an average intermarker distance lower than 
150kb. 

Genome-wide LD mapping aims at identifying, for any TCA being searched, at least one biallelic marker in LO 
with said TCA. Preferably, in order to enhance the power ol LD maps, in some embodiments, the hialiclic markers therein 
have average inter-marker distances of 150kb or less, 75 kb or less, or 50 kb or less, 30kb or less, or 25kb or less to 
accommodate the fact that, in some regions of the genome, the detection of LD requires lower inter-marker distances. 

The present invention provides methods to generate bialleOc nfiarkcr mops with average intcr-marker distances 
of 150kb or less. In soma embodiments, the mean distance between biallelic markers constituting the high density map 
will be less than 75kh, preferably less than 50kb. Further preferred maps according to the present invention contain 
markers that ere less than 37,5kb apart In highly preferred embodtmants, the average inter-marker spacing for the 
bialleBc markers constituting very high density maps is less than 30kb, most preferably less than 25kb. 

Genetic maps containing biallelic markers (including the 653 biallelic markers obtained above, which include the 
sequences of SEQ ID Nos. 1-SO and 5M00 or the sequences complementary thereto) may be used to identify and 
isolate genes associated with detectable uaits. The use of the genetic maps of the present invention is described in 
more detail below. 

Use of the Hinh Density Biallelic Marker Mao to Identify 
Genes Associated with a Detectable Trait 
One embodiment of the present invention comprises methods for identifying and isolating genes associated 
with a detectable trait using the biallelic marker maps of the present invention. 

In the past, the identification of genes linked with detectable traits has relied on a statistical approach called 
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linkage analysis, linkage analysis is based upon estabOshing a correlation between the iransmission of genetic markers 
and that of a specific trait throughout generations within a famfly. In this approach, all members of a scries of affoctcd 
families are ocnolyped with a few hundred markers, typically microsalcllitc markers, which arc distributed at an average 
density of one every 10 Mb, By comparing genotypes in afl family members, one can attribute sets of alleles to parental 
5 haploid genomes (haplotyping or phase determination). The origin of rccombined fragments is then dotorminod in tliu 

offspring of all families. Those that co-segregate with the trait are tracked. After pooling data from all families, 
statistical methods are used to determine the likelihood that the marker and the trait are segregating independently in nil 
families. As a result of the statistical analysis, one or several rcutons having a high probabiliiy of harboring a gtmc linked 
to the trait arc selected as candidates for furtlicr analysis. The result of linkage analysis is considered as significant (i.e. 

10 there is a high probability that the region contains a gene involved in a detectable trait) when the chance of independent 
sagregation of the marker and the trait is lower than 1 in 1000 (expressed os 0 LOD score > 3). Generally, the length 
of the candidate region identified using linkage analysis is between 2 and 20Mb. 

Once a candidate region is identified as described above, analysis of recombinant individuals using additional 
markers allows further delineation of the candidate linked region. 

15 Linkage analysis studies have generally relied on the use of a maximum of 5.000 mtcrosatellite markers, thus 

bmiting the maximum theoretical attainable resolution of linkage analysis to ca. 600 kb on average. 

Linkage analysis has been successfully applied to map simple genetic traits that show clear Mendclian 
inheritance petterns and which have a high penetrance (penetrance is the ratio between the number of trait positive 
carriers ol allele 3 and the total number of a carriers in the population). About 100 pathological trait-causing genes were 

20 discovered using linkage analysis over the last 10 years. In most of these cases, the majority of affected individuals had 

affected relatives and the detectable trait was rare in the general population (Irequendes less than 0«1%), In about 10 
cases, such as Alzheimer's Disease, breast cancer, and Type II diabetss, the detoctablc trait was more common but the 
aDele associated with the detectable trait was rare in the affected population. Thus, the alleles associated with these 
traits were not responsible for the trait in all sporadic cases. 

25 Linkage analysis suffers from a variety of drawbacks. First, linkage analysis is Dmiied by its reliance on the 

choice of a genetic model suitable for each studied trait. FurthennorB, as already mentioned, the resolution attainable 
using linkage analysis is limited, and complemenlary studies are required to refine the analysis of the typical 2Mb to 
20Mb regions initiaOy identified through linkage analysis. 

In addition, finkage analysis approaches have proven difficult when applied to complex genetic traits, such as 

30 those due to the combined action of multiple genes and/or environmental factors. In such cases, too large an effort and 
cost are needed to recruit the adequate number of affected fan^Ties required for applying roikago analysis to these 
situations, as recently discussed by Risch, N. and Merikangas, K. {Sc/encB 273;1516-1517 (1996), the disclosure of 
which is incorporated herein by reference). 

Finally, linkage analysis cannot be applied to the study of traits for which no large informative families arc 

35 avaibhlfi. Typically, this will be the case in any attempt to identify trait-causing alleles invoh^ed in sporadic cases, such 
as alleles associated with posithre or negative responses to drug treatment. 
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The present oenclic maps and biallclic markers (including the 653 biallefic markers obtained above, which 
include the sequences of SEQ ID Nos. 1-50 and 5M00 or the sequences complementary thereto) may be used to 
identify end isolate genes associated with detectable traits using association studies, an approach which does not 
require the use of affected families and which permits tlic identification of genes associated with sporadic traits. 

Association studies are described in more dutail baJow. 

Association Studies 

As already mentioned, any gene responsible or partly responsible for a given trail will be in LO with some 
flanking markers. To map such a gene, specific alleles of these flanking markers which arc associated with the o*Jno or 
genas responsible for the trait are identified. Although the following discussion of techniques for finding the gene or 
genes associated with a particular trait using linkage disequilibrium mapping, refers to locating a sinflle gene which is 
responsible for the trait it win be appreciated that the same techniques may also be used to identify genes which are 
partially responsible for the trait. 

Association studies may be conducted within the gereral population (as opposed to the linkage analysis 
techniques discussed above which ere Bmhed la studies performed on related individuals in one or several affected 
families). 

Association between a biallelic marker. A end a trait T may primarily occur as a result of three possible 
relationships between the biallelic marker and the trait. 

First allele a of biallelic marker A may be directly responsible for trait T (eg., Apo E €4 site A and Alzheimer's 
disease). However, since the majority of the biallelic markers used in genetic mapping studies are selected randomly, 
they mainly map outside of genes. Thus, the likelihood of allele a being a functional mutation directly related to trait T is 
very low. 

Second, an association between a biallelic marker A and a uait T may also occur when the biallelic marker is 
very closely linked to the trait locus. In other words, en association occurs when allele a is in linkage disequilibrium with 
the trait-causing aOele. Whan the biallelic marker is in dose proximity to a gene responsible for the trait, more extensive 
genetic mapping will ultimately allow a gene to be discoveicd near the marker locus which carries mutations in people 
with trait T (i.e. the gene responsible for the trait or ono of the genes responsible for the trail). As will be further 
exemplified below, using a group of biallefic markers which are in close proximity to the gene responsible for the trart the 
location of the causal gene can be deduced from the prof ila of the association curve between the biallelic markers and 
the traiL The causal gene will usually be found 6i the vicinity of the marker showing the highest association with the 
trait. 

finally, an association between a biaDolic marker and a trait may occur when people with the trait and people 
without the trait correspond to genetically different subsets of the population who, coincidcntally, also differ in the 
frequency of allele a (population stratification). This phenomenon may be avoided by using ethnically matched large 
heterogeneous samples. 

Association studies are particularly suited to the efficient identification of genes that present common 
polymorphisms, and are involved in multifactorial traits whose frequency is relatively higher than that of diseases with 
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monofactorial inhoritancc. 

Association studies mainly consist of four steps: recruitment of trait-positivB fT+l and Irait-nogaiivc (T-) 
populations with wcll-dcfincd phanotypes, idcntificntion of a candidate rcoion suspected of harborino a trait causing 
Ocnc, identification of said (jenc among candidate genes in the region, and finally validation of mutation(s) responsible for 
the trait in said trait causing gere. 

In a first step, trait* and trait - phenotypcs have to be well-defined In order to perform efficient and 
significant association studies such as those described herein, (he trait under study should preferably follow a bimodal 
distribution in the population under study, presenting (wo clear nan-overlapping phcnotypes, uait + and trait 

Nevertheless, in the absence of such a bimodal distribution (as may in fact be the cose for complex o'^"ctic 
traits), any genetic trail may still be analyzed using the association method proposed herein by carefully selecting thef"' 
individuals to be included in tin? trait and trait - phenotypic groups. The selection procedure involves selecting 
individuals at opposite ends of the non-himodaJ phcnotypc spectrum of the trait under study, so as to include in these 
trait + and trait - populations individuals who clearly represent non- overlapping, preferably extreme phcnotypes. 

The definition of the inclusion criteria for the trait + and trait - populations is an important aspect ol the 
present invention. The selection of those drastically different but relatively uniform phenotypcs enables efficient 
comparisons in association studies and the possible detection of marked differences at the genetic level, provided that 
the sample sizes of the populations under study are significant enouglu 

Generally, trait + and trait - populations to be included in association studies such as those proposed in the 
present invention consist of phenotypically homogeneous populations of individuals each representing 100% of the 
corresponding phcnotypc if the trait distribution is bimodal. If ihc trait distribution is non bimodal, trait + and trait - 
populations consist of phenotypically uniform populations of individuals representing each between 1 and 
preferably between 1 and 80%, more preferably between 1 and 50%, and more preferably between 1 and 30%, most 
preferably between 1 and 20% of the total population under study, and selected among individuals exhibiting non- 
overlapping phenotypcs. In some embodiments, the T^ and T groups consist of individuals exhibiting the extreme 
phcnotypes within the studied population. The clearer the difference between the two trait phenotypcs, the greater the 
probability ol detecting an association with biallelic markers. 

In preferred embodiments, a first group of between 50 and 300 trait + individuals, preferably about 100 
individuals, are recruited according to their phenotypes. In each case, a similar number of trait negative individuals are 
included m such studies who are preferably both ethnically- and ago-matched to the trait positive cases. Both trait and 
trait - individuals should correspond to unrelated cases. 

figure 3 shows, for a series of hypothetical sample sizes, the p-value significance obtained in association 
studies performed using individual markers from the high-densily biallelic map, according to various hypotheses regarding 
the difference of allelic frequencies between the 1* and T- samples. It indicates that, in ail cases, samples ranging from 
150 to 500 individuals arc numerous enough to achieve statistical significance. It will be appreciated that bigger or 
smaller groups can he used to perfonn association studies according to the methods of the present invention. 

In a second step, a marker/trah association study is performed that compares the genotype frequency of each 
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bialfelic marker in the above described !-»■ and T- populations by means of a chi square statislical test (one degree of 
freedom). In addition to this single marker association analysis, a liaplotype association analysis is performed to define 
tfie frequency and Ihc type of the ancestral carrier haplotype. Hoplotype analysis, by combining the informativencss of a 
set of biallelic markers increases the power of the association analysis, allowing false positive and/or negative data that 
may result from the single marker studies to be eliminated. 

Genotyping can be performed using the microsequencing procedure described in Example 13, or any other 
genotyping procedure suitable for this intended purpose. 

If a positive association with a trait is identified using an array of bialloiic markers having a high enough 
density, the causal gene will be physically located in the vicinity of the associated markers, since the markers showing 
positive association with the trait are in Enkage disequilibrium with the trait locus. Regions harboring a gene responsible 
for a particular trait which are identified through association studies using high density sots of biallelic markers will, on 
average, be 20 • 40 times shorter in length than those identified by linkage analysis. 

Once a posilh^c association is confirmed as described above, a third stop consists of complGtely sequencing the 
BAG inserts harboring the markers identiHed in the association analyzes. These BACs arc obtained through screening 
human genomic libraries with the markers probes and/or primers, as described above. Once a candidate region has been 
sequenced and analyzed, the functional sequences within the candidate region (e,g. exons, spfice sites, promoters, and 
other potential regulatory regions) are scanned for mutations which are responsible for the trait by comparing the 
sequences of the functional regions in a selected number of T+ and T- individuals using appropriate software. Tools lor 
sequence analysis are further described in Example 14, 

Finally, candidate mutations arc then validated by screening a larger population of T+ and 
T- individuals using genotyping techniques described below. Polymorphisms arc confirmed as 
candidate mutations when the validation population sliows association results compatible with those 
found between the mutation and the trait in the test population. 

In practice, in order to define a region bearing a candidate gene, the trait + and trait • populations are 
genotyped using an appropriate number of biallelic markers. The markers may include one or more of the 653 markers 
obtained above (which include the sequences of SEQ ID Nos: 1-50 and 5M00 or the sequences complementary thereto. 

The markers used to defino a region bearing a candidate gene may be distributed at an average density of 1 
marker per 10-200 kb. Preferably, tha markers used to detine 8 region bearing a candidate gene arc distributed at an 
average density of 1 marker every 15-750 kb. In further preferred embodiments, the markers used to define a region 
bearing a candidate gene are distributed at an average density of 1 marker every 20-1 OOkb. In yet another preferred 
embodiment, the markers used to define a region bearing a candidata gene are distributed at an average density of 1 
marker every 100 to 150kh. In a further highly preferred embodiment, tha markers used to define a region bearing a 
canddate gene are distributed at an average density of 1 marker every 50 to lOOkb. In yet another embodiment, the 
biallelic markers used to define a region bearing a candidate gene are distributed at an average density of 1 marker every 
25-50 kilobases. As mentioned above, in order to enhance the power of linkage disequilibrium based maps, in a preferred 
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emfaQiiimcnt, the marker density of ths map will be adapted to take the Imkagc disequilibrium distribution in the Qcnomic 
rsgion of interest into account. 

In some embodiments, ths initial identification of a candidate genomic region harboring a gone associated with 
0 detectable phenotypc may be conducted using a prGliminnry map containing a few thousand biallclic markers, 
Theroafter, the genomic region harboring the gene responsible for tlic detectable trait may be better delineated using a 
map containing o larger number of bteUelic markers. Furthermore, tha genomic region harboring the gene responsible for 
the detectable trait may bo furthur delineated using a tiigh density mop of biallelic markers. Finally, the gene associated 
with the detectable trait may be identified and isolated using a very high density biallelic marker map. 

Example 11 describes a hypothetical procedure for identifying a candidate region harboring a gene associated 
with 3 detectable trait. It will be appreciated that altfiough Example 11 compares the. results of analyzes using markers 
derived from mops having 3,000, 20,00Q< and 60,000 markers, the number of markers contained in the map is not 
restricted to these exemplary figures. Rather, Example 11 exemplifies the increasing refinement of the candidate region 
with increasing marker density. As increasing numbers of markers arc used in the analysis, points in the association 
analysis become broad peaks. The gene associated with the detectable trait under investigation will lie within or near 
the region under the peak. 

Examob 1 1 

Identification of a Candidate Reninn Hnrboring a 
Gene Associated with a Detecrable Trait 
The initial identification of a candidate genomic region harboring a gene associated with a detectable trait may 
be conducted using a genome-wide map comprising about 20.000 biallelic markers. The candidate genomic region may 
be further defined using a map having a higher marker density, such as a map comprising about 40,000 markers, about 
60,000 markersi about 80,000 markers, about 100,000 markers, or about 120,000 markers. 

The use of high density maps such as those describad above allows the identification of genes which are truly 
associated with detectable traits, since the coinddental associations will be randomly distributed along the genome 
while the true associations will map within one or mora discrete genomic regions. Accordingly, biallenc markers located 
in the vicinity of a gene associated with a detectable trait will give rise to broad peaks in graphs plotting the frequencies 
of the biallelic markers in T**- individuals versus T- individuals. In contrast, biallelic markers which are not in the vicinity 
of the gene associated with the detectable trait wiQ produce oniqua points in such a plot. By determining the 
association of several markers within the region containing tlie gene associated with tha dotcctablo trait, the gene 
associated with the detectable trait can be identified using an association curve which reflects the difference between 
the allele frequencies within the T^ end T* populations for each studied marker. The gene associated with the 
detectable trait will be found in tha vicinity of the markof showing the highest association with the trail. 

Figures 4, 5, and 6 iflustrate the above principles. As illustrated in Figure 4, an association analysis conducted 
with a map comprising about 3.000 biallelic markers yields a group of points. However, when an association analysis is 
performed using a denser map which inchides additional biallelic markers, the points become broad peaks indicative of 
the location of a gene associated with a detectable trait For example, the biallelic markers used in the initial association 
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analysis may be obtained from a map comprising aboul 20,000 biallclic markers, as iHustrated in figure 5, In some 
embodimenis. one or more of the 653 biallelic markers obtained above {which include the sequences of SEO ID Nos. 1-50 
and 5M00 or the sequences complementary thsrato) arc used in the association analysis. 

In the hypothetical eiompla of Figure 4, the associalion analysis with 3,000 markers suggests peaks near 
markers 9 and 17. 

Next, a second analysis is performed using additional markers in the vicinity of markers 9 and 17, as illustrated 
in the hypothotlcol example of Figure S, using a map of about 20.000 markers. This step again indicates an ossociaiion 
in the close vicinity of marker 17, since more markers in this region show an association with the trait. However, none 
of the additional markers around marker 9 shows a sionilicant assecialion with the troit, which makes marker 9 a 
potential false positive. In some embodiments, one or more of the 653 biallclic maikcrs obtained above' (which include 
the sequences of SEQ ID Nos, 1-50 and 51-100 or tlic sequences complementary (hereto) are used in the second 
analysis. In order to further test the validity of these two suspected associations, a third analysis may be obtained with 
a map comprising about 00,000 biaQeltc markers, in soma embodiments, ona or more of the 653 biallelic markers 
obtained above are used in the iliird association analysis. In the hypothetical example of Figure C, more markers lying 
around marker 1 7 exhibit a high degree of association with the detectable trait. Conversely, no association is confirmed 
in tho vicinity of marker 9. The genomic region surrounding marlccr 17 can thus be considered a candidate region for the 
hypothetical trait of this simulation. 

The statistical power of LD mapping using a high density marker map is also reinforced by complemonting the 
single point association analysis described above with a multi-marker association analysis, called haplotype analysis. 

When a chromosome carrying a disease allele is first introduced into a population as a result of either mutation 
or migration, tk mutant allele necessarily resides on a chromosome having a unique set of linked markers: the ancestral 
haplotype. As already mentioned, a haplotype associalion analysis allows the frequency and the type of the ancestral 
carrier haplotype to be defined. 

A haplotype analysis is performed by estimating the frequencies of all possible haplotypes for a given set of 
biallelic markers in the T+ and T- populations, and comparing these frequencies by means of a chi square statistical test 
{one degree of freedom). Haplotype estbialions are usually performed by applying the Expectation-Maximization (EM) 
algorithm (Excofficr I and Slatkin M. MoL BioL EvqL 11921-927 (1995). the disclosure of which is incorporated herein 
by reference), using the EM-HAPLO program (Hawley ME, Pakslis AJ & Kidd KK, Am, 1 Pbys. Anthropol, 1 8:1 04 
(1994), the disclosure of which is incorporated herein by reference). The EM algorithm is used to estimate haplotype 
frequencies in tho cose when only genotype data from unrelated individuals are available. The EM algorithm is o 
generalized iterative maximum likelihood approach to estimation that is useful when data are ambiguous and/or 
incomplete. 

To improve the statistical power of the individual marker association analyses conducted as described above 
using maps of increasing marker densities, haplotype studies can be performed using groups of markers located in 
proximity to one another within regions of the genome. For example, using the methods described above in which the 
association of an individual marker with a detectable phenotype was analyzed using maps of 3,000 markers, 20,000 
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markers, and 60,000 markew, a series of haplotypc studies can be performed using groups of contiguous markers from 
such maps or from maps havuiQ higher marker densities. 

In a preferred embodiment, a scries of successive haplotype studies includino flroups of markers spanning 
regions of more than 1 Mb may be performed. In some embodiments, the biallelic markers included in each of these 
groups may be located within a genomic region spanning less than Ikb, from 1 to 5kb, from 5 to lOkb, from 10 to 25klj, 
from 25 to 50kb, from 50 to ISOkb, from 150 to 250kb, from 250 to SOOkb. from 500kb to 1Mb, or more limn 1ML 
Preferably, Uie genomic regions containing the groups of biallelic markers used in the successive haplutype analyses are 
overlapping. It will be appreciated that the groups of biallelic markers need not completely cover the genomic regions of the 
above spccincd lengths but may instead be obtained from incomplete contigs having one or more gaps therein. As discussed 
in further detail below, biallelic markers may boused in single point and haplotype association analyses regardless of the v 
complotcncss of the corresponding physical contig harboring them. 

Without wishing to be limited to any panicular numerical value, it is belrcvBd that those haplotypes dispiayhig a 
coefficient of relative risk above 1. preferably about 5 or more, preferably of about 7 or more are indicative of a 
•significant risk* for the individuals carrying the identified haplotype to develop the given trait. However, it is difficult to 
evaluate accurately quantified boundaries for the so-caOed 'significant risk'. Indeed, and as it has been demonstrated 
prewously, several traits observed m a given population are multifactorial in that they arc not only the result of a single 
genetic predisposition but also of other factors sucli as environmental factors. Thus, the evaluation of a significant risk 
must take these parameters into consideration in order to, iii a certain manner, weigh the potential importance of 
external parameteis in the development of a given trait. Thus, tho relative risk which constitutes a 'significant risk" to 
develop a given trait is evakjatcd differently depending on the trait under consideration and the populations tested. 

Genomo wide mapping using association studies with dense enough arrays of markers permit a case by-case 
best estimate of p-value significance thresholds. Given a test population comprising two ethnically matched trait 
positive and trail negative groups of about 50 to about 500 individuals or mora, conducting the above described 
assodation studies will allow a p-valuo "cut-off to be established by, for example, analyzing significant numbers of 
allele frequency differences or, in some cases where appropriate, running computer simulations or control studies as 
described in Example) 1 1, 20, and 31. 

For a p-valuB above the threshold, a corresponding association between the trait and a studied marker will be 
deemed not significant, while for a p-value below such a threshold, said association will be deemed significant. If the p- 
valuc is significant, the genomic region arround the marker will be further scruiini2cd for a trait-causing gene. 

It is preferred that p-valuo significance thresliolds he assessed for each case/control population comparison. 
Both the genetic distance between sampled populalion-'stratification'-and the dispersion due to random selection of 
samples may indeed influence the p-valuc significance thresholds. 

It will be appreciated that the above approaches may be conducted on any scale li.e. over the whole genome, a 
set of chromosomes, a single chromosome, a particular subchromosomal region, or any other desired portion of the 
genome). As mentioned above, once significance thrasholds have been assessed, population sample sizes may be 
adapted as exemplified in Rgure 3. 
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Exarrpte 12 below illustrates the tncredsc in statistical power brought to an association study by a haplotypc 

analysis. 

Exampla 12 

Haplotvpe Analysis: Identification of bfnllclic markars rfclincMin g 
a nr?noniic reoion associated with Afzheimer s Discose <ADI 

As shown in Table 3 within Example 10, at an average map density of one marker pur 40 kb only one marker 
199-365/344 ) out of five random biallolic markers from a ca. 200 kb genomic region around the Apo E gene showed a 
clear association to AD (delta allelic frequency in cases and controls - 18% ; p vafua - 6,9 E-10). The allelic Iroquendes 
of the other four random markers were not significantly dilferent between AD cases and controls Ip-values k E-Ol). 
However, since linkage disequiL'briuin can usually be detected between markers located further apart than an averagG 40 
kb as previously discussed, one should expect that, performing an association study with a local excerpt of o biallelic 
marker map covering ca. 200kh with an average inter-marker distance of ca. 40kb should allow the identification of 
mora than one biaOclic marker associated with AD. 

A haptotype analysis was thus performed using tho biallelic markers 99-344/439; 99-355;2ig; 99-359/308 ; 
99-365/344 ; and 99-366/274 (of SEO 10 Nos: 301-305 and 307-31 1). 

In a first step, marker 99-355/344 that was already found associated with AD was not included in the 
haplotypc study. Only bfaOelic markers 99-344/439 ; 99-355/219 ; 99-359/308 ; and 99 366/274, which did not show 
any significant association with AO when taken individually, were used. This first haplotypc analysis measured 
frequencies of all possible two-, three-, or four-marker haplotypcs in the AD case and control populations. As shown in 
Figure 7. there was one haplotype among all the potential differem haplotypcs based on the four individually non- 
significant markers ("haplotype 8*. TAGG comprising SEQ ID No. 305 which is tho T aUeIc of marker 99-366/274, SEQ 
ID fJo. 301 which is the A allele of marker 99-344/439, SEQ ID No. 303 which is the G allele of marker 99-359/308 and 
SEQ ID No. 302 which is the G allele of marker 99-355/219), that was present at statistically significant differem 
frequencies in the AD case and control populations (A- 12% ; p value - Z05 E-06|. Moreover, a significant dilferenca 
was already observed for a three-marker haplotype included in the above mentioned 'haplotype 8' ("haplotypc 7', TGG, 
A- 10% : p value - 4J6 E-05). Haplotype 7 comprises SEO ID No. 305 which ts the T allele of marker 99-365/274, 
SEQ ID No. 303 which is the G allele of marker 99-3591308 and SEQ ID No. 302 which is the G allele of marker 99- 
3551219). The haplotype essodation analysis thus clearly increased tho statistical power of the individual marker 
association studies by more than four orders of magoiiude when compared to single-marker analysis (from p values ^ E- 
01 for the imlividual markers - see Tabic 3 - to p value ^ 2 E 06 for the four-marker "haplotype B'). 

The signiricancc of the values obtained for this haplotype association analysis was evaluated by the following 
computer simulation. The genotype data Irom the AO cases and the unaffected controls were pooled and randomly 
allocated to two groups which contained the same number of individuals as the case/control groups used to produce the 
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data summarired in Figure 7. A fcur^narkcr haplotype analysis 199-344/439; 99-355/219; 99-359/30B; and 99- 
366/274) W9S run on thass artificial Qroups. This oxpcnmcnt was reiterated 100 tsmcs and iho results aro shown in 
Figure 8. No haplotype among those generated was found for which the p-value of the frequency difference between 
both populations was more siQnificant than 1 E 05. In addition, only 4% of the generated haplotypes showed p vatuos 
lower than 1 E-04. Since both these p-va!uc thresholds are less significant than the 2 E-G6 p-value showed by 
'haplotype B", this haplotype can bn considered significdntly associated with AD. 

in a second step, marker 99-365/344 was included in the haplotype analyzes. The frequency differences 
botwocn the affected and non affected populations was calculated for all two-, three-, four- or five-marker haplotypes 
involving markers: 99-344/439; 99-355/219; 99-359/308; 09-3GC/Z74; and 99-3C5/344. The most significant p- 
valucs obtained in each caiegorY of haplotype* (involving two, three, four or five markers) were examined depending on ^ 
which markers were invoked or not within the haplotype. This showed that all haplotypes which included marker 99- 
3G5/344 showed a significant association with AO (p vaiues in the range of E 04 to E-1 1). 

An additional way of evaluating the significance of the values obtained in the haplotype association analysis 
was to perform a similar AD case-contro! study on biallelic markers generated from BACs containing inserts 
corresponding to genomic regions derived from chromosomes 13 or 21 and not known to be involved in Alzheimer's 
disease. Performing similar haplotype and individual association analyzes as those described above and in Example 10 
did not generate any significant association results (all p-valuss for haplotype analyzes were less significant than E 03; 
all p-valucs for single marker association studies were less significant than E-02). 

The results described in Examples 10 and 12. generated from individual and haplotype studies using a biallelic 
marker set of an average density equal to ca 40kb in the region of an Alzheimer's diseasa trait causing gene, indicate 
that all biallelic markers of sufficient informatrva content located within a ca. 200 kb genomic region around a TCA can 
potentially be succcsfully used to localize a trait causing gene with the methods provided by the present invention. This 
conclusion is further supported by the results obtained through measuring the linkage disequilibrium between markers 
99-365/344 or 99-359/308 and ApoE 4 Site A marker within Alzheimer's patients: as one could predict since LD is the 
supporting basis for association studies, LD between these pairs of markers was enhanced in the diseased population vs. 
the control population. In a similar way as the haplotype analysis enhanced the significance of the corresponding 
association studies. 

Once a given polymorphic site has been found and characterized as a biarietic marker according to the methods 
of the present invention, several methods can be used in order to determine the specific allele carried by an individual at 
the given polymorphic base. 

In some embodiments, gcnotyping will be applied to one or more of the markers of SEQ ID Nos: 301-305 and 
307-311 or the sequences complementary thereto. In additional embodiments, genotyping will be applied to the markers 
of SEQ ID Nos. 306 and 312 as well as one or more of the markers of SEQ ID Nos. 301-305 and 307-311. In some 
embodiments, genotyping will be appFrnd to one or more of the 653 biallelic markers obtained above (which include the 
sequences of SEQ ID Nos. 1-50 and 51-100 or the sequences complementary thereto). The present invention further 
contemplates the genotyping of any biallelic marker within the provided maps, including those that are in linkage 
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disequilibrium with ihe 653 biallclic markers obtained above (which include the saquenccs of SEQ ID N05. 1-50 and 51* 
100 or the sequences complementary thereto) or the markers of SEQ ID Nos. 301-312 or the sequences complementary 
thereto. 

Most genotyping methods require the previous amiilificalion of 8 DMA region carrying tho polymorphic site of 

5 interest. 

The identification of bialtolic markers described previously, allows the design of appropriate oligunudcotidBs, 
which can be used as primers to amplify a ONA fragment containing the polymorphic site of interest and for the 
detection of such polymorphisms. 

In particularly preferred embodiments, pairs of primers of SEQ ID Nos: 313-318 and 319-324 may ho used to 
10 Generate amplicons harboring the markers of SEQ ID Nos: 301-30G/307-312 or the sequences compIcnicnlar/thBreto. In 
further embodiments, pairs of amplification pruncrs may be used to generate amplicons harboring the 653 markers 
obtained above (which include the sequences of SEQ ID Nos. 1-50 and 5M0D or tlic sequences complementary thereto. 
In highly preferred embodiments, pairs of the amplification primers of SEQ ID Nos: 101-150 and 151-200 may Lu used 
to generate amplicons harboring the markers of SEQ 10 Nos: 1-50 and 5M00 or the sequences complementary thereto. 
15 It will be appreciated that ampRfication primers may be designed having any length suitable for their intended 

purpose, in particular any length allowing their hybridization with a region of the ONA fragment to be ampltlled. 

It will be further appreciated that tho hybridization site of said amplification primers may be located at any 
distance from the polymorphic base to be genotypcd. provided said amplification primers allow the proper amplification 
of a ONA fragment carrying said polymorphic site. The amplification primers may be oligonucleotides of 10, 15, 20 or 
20 more bases in length which enable the ampUfication of the polymofphic site in the markers. In some embodiments, the 

amplification product produced using these primers may be at least 100 bases in length (Le. on average 50 nucleotides 
on each side of the polymorphic base). In other embodimenis, the amplilication product produced using these primers 
may be at least 500 bases in length (i.e. on average 250 nucleotides on each side of the polymorphic base). In still 
further embodiments, the amplification product produced using these primers may be at least 1000 bases in length (i.e. 
25 on average 500 nucleotides on each side of the polymorphic base). 

The ampGfication of polymorphic fragments can be carried as described in Example 8 on ONA samples 
extracted described in Example 5. 

As already mentioned, allele frequencies of biaOelic markers tested in association studies (individual or 
haptotype) may be detarmined using microsequencing procedures. 
30 A first step in microsequencing procedures consists in designing microsequencing primers adapted to each 

biallelic marker to be genotyped. Microsequencing primers hybridize upstream of the polymorphic base to be genotyped, 
either with the coding or with the non-coding strand. Microseauencing primers may be oligonucleotides of 8. 10, 15, 20 
or more bases in langth. Preferably, the 3' end of the microsequencing primer is immediately upstream of the 
polymorphic base of the biallelic marker being genotyped, such thai upon extension of the primer, the polymorphic base 
35 is the first base incorporated. Such microsequencing primers are included within the scope of the present invention. 
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In preferred embodimenls, the microsequencing primers are those indicated 3S features within the sequence 
listings corresponding to markers of SEQ ID Nos: 325-330/331-336. In some embodiments, the 653 biallclic morkers 
obtained above (wtiich include the sequences of SEQ 10 Nos. 1-50 and 5M00 or the sequences complementary thereto) 
arc genotypcd usino appropriate microsequencino oligonucleotides such as those of SEQ ID Nos. 201-250 or 251-300. 

It will be appreciated that the biallelic markers of the present invention may be gcnotyped using 
microsequencino primers hovino ony desirable length, and hybridizing to any of the strands of the marker to bo tested, 
provided their design is suitable for ihcir intended purpose. In some embodiments, the amplification primers or 
microsequencino primers may be labeled. For example, in somo embodiments, the amplification primers or 
microsequencing primers may be biotinylated. 

Typical microsequencing procedures that can be used in the context of the present invention are described in 
Example 13 belovif. 

Example 13 

Genotyping of biallclic markers usinp microsequencing riroccdurcs 
Several microsequencing protocols conducted in Uquid phase are welt known to those skilled in the art. A first 
possible detection analysis allowing the allele characterization of the microsequencing reaction products relies on 
detecting fluorescent ddMTP- extended microsequencing primers after gel electrophoresis. A first alternative to this 
approach consists in performing a liquid phase microsequencing reaction, the analysis of which may be carried out in 
solid phase. 

For example, the microsequencing reaction may be pcrfomicd using 5'-biolinylated oligonucleotide primers and 
fluorcsccin-dideoxynucleotides. The biotinylated oSgonudeotida is annealed to the target nucleic acid sequence 
immediately adjacent to the polymorphic nucleotide position of interest. It is then specifically extended at its 3' cnd 
following a PCf^ cycle, wherein the labeled didcoxynucleotide analog complementary 10 the polymorphic base is 
incorporated. The biotinylated primer is then captured on a microtiter plate coated with streptavidin. The analysis is 
thus entirely carried out in a microliter plate format. The incorporated ddNTP is detected by a fluorescein antibody • 
alkaline phosphatase conjugate. 

In practice this microsequencing analysis is performed as follows. 20 fj\ of the microsequencing reaction Is 
added to 80 fj\ of capture buffer (SSC 2X. 25% PEG 8000, 0.25 M Tris pH7.5, 1.8% BSA, 0.05% Tween 20) and 
incubated for 20 minutes on a microtitcr plate coated with sueptavidin (Boehringerl. The plate is rinsed once with 
washing buffer (0.1 M Tris pH 7.5, 0.1 M NaCI, 0.1% Tween 20). 100 fj\ of anti-duorescein antibody conjugated with 
phosphatase alkaline, diluted 1/5000 in washing buffer containing 1.8% BSA is added to the microtiter plato. Thn 
antibody is incubated on tha microtiter plate for 20 minutes. After washing the microtitcr plate four times, 100 pi of 4- 
methylumbelliferyl phosphate (Sigma) diluted to 0.4 mg/ml in 0.1 M diethanolamine pH 9.6, lOmM MgClj arc added. The 
detection of the microsequencing reaction is carried out on a fluurimeter (Dynatech) after 20 minutes of incubation. 

As another alternativBi solid phase microsequencing reactions have been developed, for which either the 
oligonucleotide microsequencing primers or the PCR-ampGfied products derived from the DNA fragment of interest are 
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immobilized For example, immobilization can be carried out via an inlfiraclion between biotinylated DMA and 
slreptavidin-coated microtitratlon wells or a/idin coated polystyrene particles. 

As a further allemativa, the PCR reaction generating the ampficons to be genotyped can be performed directly 
in solid phase conditions, follawinD procedures such as those described in WO 96/13009, the disclosure of which is 
incorporated herein by reference. 

In such solid phase microscquencing reactions, incorporated ddMTPs can either be radiolabeled (see Syvancn. 
Cfm, C/j/m, Acta. 226:225-236 (1994). the disclosure of which is incorporaied herein by reference) or linked to 
fluorescein (see Livak aiid Haincr, Hum. MetaL 3:379-385 (1994), the disclosure of which is incorporated herein hy 
reference). The detection of radiolabeled ddNTPs can be achieved through scintillation based techniques. The detection 
of fluorescein-linked ddNTPs can be based on the binding of antifluorcscein antibody conjuuatcd with .alkaline 
phosphatase, followed by incubation with a chrrnnoocriic substrate (such as p-nitrophunyl phosphate). 
Other possible reporter-detection couples for use in the above miaoscquencing procedures include : 
ddNTP Gnked to dinitrophenyl (DNP) and anti ONP alkaline phosphatase conjugate (see Haiju et al., Ci/n 
ChermZil} 1 Pi 1):2282-22B7 (1 9931, incorporated herein by reference) 

biotinylated ddNTP and horseradish peroxidaso-conjugated streptavidin with o-phenylenedlaminc as a substrate (see 
WQ 92/15712^ incorporated herein by reference). 

A diagnosis kit based on fhiorcsccin-linked ddNTP with antifluorescein antibody conjugated with alkaline 
phosphatase has been commercialized under the name PRONTO by GamidaGen ltd. 

As yet another alternative microsequencing procedure. Nyren et a[. {Anal Biochcm. 208:171-175 (1993), the 
disclosura of which is incorporated herein by reference) have described a solid-phase DNA sequencing procedure that 
refies on the detection of DNA polymerase activity by an enzymatic luminomciric inorganic pyrophosphate detection 
assay (EUDA). h this procedure, the PCR-amplified products arc biotinylated and immobilized on beads. The 
microsequencing primer is annealed and four aliquots of this mixture are seporateiy incubated with DNA polymerase and 
one of the four different ddNTPs. After the reaction, tha resulting fragments arc washed and used as substrates in a 
primer extension reaction with all four dNTPs present. Tlie progress of the DNA directed polymerization reactions is 
monitorad with the EUQA, Incorporation of e ddNTP in the first reaction prevents the formation of pyrophosphate during 
the subsequent dNTP reaction. In contrast, no ddNTP incorporation in the first reaction gives extensive pyrophosphate 
release durirtg the dNTP reaction and this leads to generation of light throughout the ELIOA reactions. From the ELIDA 
results, the identity of the first base after the primer is easily deduced* 

It will be appreciated that several parameters of tha above-described microsequencing procedures may be 
successfully modified by those skilled in the art without undue expcrtmentation. In particular, high throughput 
improvements to these procedures may be elaborated, following principles such as those described further below. 

It will be further appreciated that any other ocnolyping procedure may bo applied to the genotyping of hiallclic 

markers. 
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Once the candidate region has been delineated using the high density biatlelic marker map, a sequence analysis 
process will allow the detectien o{ all genes located within said region, together with a potential functional 
characterization of said genes. The identified functional features may allow preferred trait-causing candidates to be 
chosen from among the identified genes. More bialWic markers may then be generated within said candidate genes, and 
used to perform refined association studies that will support the identification of the trait causing gene. Sequence 
analysis processes are described in Example 14 below. 



Example 14: ScQ^iBnce Anafysij 
ONA saquBnces, such as BAC inserts, containing the region carrying the candidate gene associated with the 
10 detectable trait are scquorwcd and their sequence is analyzed using automated software which eliminates repeat ' 

sequences while retaining potcnlial gene sequences. The potential gene sequences are compared to numerous databases 
to identify putential cxons using a set of scoring algorithms such as trained Hidden fWIarkov Models, statistical analysis 
models (im:luding promoter prediction tools) and the GRAIL neural nelworL Preferred databases for use In this analysis, 
the constrtjction and use of which are further detailed in Example 22 below, include the following: 

15 

NetGene database: 

This proprietary database contains sequences of 5' cDNA tags, obtained from a number of tissues and cells. 
Currently more than 50,000 different 5' clones representing more than 50.000 different genes are included in NDlGene. 
The sequences in the NetGene database correspond specifically to the 5' regions of transcripts (first cxons) and 
20 therefore allow mapping of the beginning of genes within raw genomic sequences. 

NRPU (Won-Rcdundant Protein-Unifluel database : 

NRPU is a non-rcdondant merge of the publicly available NBRF/PIR. Genpept, and SwissProt databases. 
Homologies found with NRPU allow the identification of regions potemially coding for already known proteins or related 
25 to known proteins (translated exons). 



NREST (Non-Bedunriant FST riataho^p) * 

NREST is a merge of the EST subsection of the publicly available GcnBank database. Homologies found with 
NREST allow the location of potentially transcribed regions (translated or non-translated eions). 

30 

IjIRN (Non-Redundant Nucleic acid dalahascl: 

NRN is a merge ol GenBank, EMBL and their daily updates. 



35 



Any sequence giving a positive hit with NRPU, NREST or an 'exceilent- score using GRAIL orfand other scoring 
algorithms is considered a potential functional region, and is then considered a candidate for genomic analysis. 
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While this first scfBsning allows tte deleclion of the 'strongesf exons, a semi-automatic scan is further 
applied to the remaining sequences in the context of the sequence assembly. That is, the sequences nciohboring a 5* 
site or an oxon arc submitted to onothor round of bioinformatics analysis with modified parameters. In this way, new 
cxon candidates are generated for genomic analysis. 

Using the above procedures, genes associated with detectable traits may be idimtified. 

Examples 15-23 illustfatc the application of the above methods using biallelic markers to iiluiUify a fjane 
associated with a complex disease, prostate cancer, within a ca. 450 kb candidate region. Additonal details of the 
identification of the gene associated with prgstotc cancer are provided in the U.S. Patent Appfication enthlcd "Prostate 
Cancer Gone* (GENSET.018A, Serial No. 08/995,306), the disclosure of which is incorporated herein by reference. 



Use of Biallelic Markers to Identify a Gene Associated with Prostate Cancer 
Substantial amounts of LOH data supported the hypothesis that genes associated with distinct cancer types 
are located within a particular rcoion of the human genome. More specifically, this region was likely to harbor d guno 
associated with prostate cancer. Association studies were performed as described below in order to identify this 
prostate cancer gene, A YAC contig contaimng the genomic region suspected of harboring a gene associated with 
prostata cancer was constructed as described in Example 1 5 below. 

Example 15 

YAC Conlig Construction in the Candidate Genomic Rcriion 
Rrsl, a YAC contig which contans the candidate genomic region was constructed as follows. The CEPH- 
Genethon YAC map for the entire human genome (Chumakov ct al. (1995), supra] was used for detailed contig building in 
the genomic region containing genetic markets known to map in the candidate genomic region. Screening data available 
for several publicly available genetic markers were used to select a set of CEPll YACs localized within the candidate 
region. This set of YACs was tested by PGR with the above mentioned genetic markers as well as with other publicly 
available markers supposedly located within the candidate region. As a result of these studies, a YAC STS contig map 
was generated around genetic markers known to map in this genomic region. Two CEPH YACs were found to constitute 
a minimal tiling path in this region, with an estimated sue of ca. 2 Megabases. 

During this mapping effort several publicly known STS markers were precisely located within the contig. 
Example 16 below describes the identification of sots of biallelic markers within the candidate genomic region. 

Example 16 
BAC contin construction and 
Biallelic Markers isolation whhin the candidate chromosomal region. 
Next, a BAC contig covering the candidate genomic region was constmcted as follows. BAC libraries were 
obtained as described in Woo et al, /Vuc/eic Acids Res. 22:49224931 (1 994), the disclosure of which is incorporated 
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herein by reference. Briefly, tho two whole human genoma BamHI and Hindlll libraries already described in Example 1 
were constructed using the pBcIoBAC1 1 vector (Kim ct al. (1996). suprsi 

Tho BAC libraries were then screened with all of the above mentioned STSs, following the procedure described 
in Example 2 above. 

The ordered BACs sclcclcd by STS screening and verified by FISH, were assembled into contigs and new 
markers were generated by partial sequencing of insert ends from some of them. These markers were used to fill the 
gaps in the contig of BAC clones covering the candidate chromosomal reflion having an estimated size of 2 mcgabascs. 

Figure 9 illustrates a minimal array of overfapping clones which was chosen for further studies, and the 
positions of tho publicly known STS markers along said contig. 

Selected BAC clones from the contig were subcloned and sequenced, essentially following the proctfdufes 
described in Examples 3 and 4. 

Biallelic markers lying along the contig were identified following tho processes described in Examples 5 and 6. 

Figure 9 shows the locations of the biallelic markers along the BAC contig. This first set of markers 
corresponds to a medium donsity map of the candidate locus, with an inter-markcr distance averaging 5Qkb-150kb, 

A second set of biallelic markers was then generated as described above in order to provide a very higli dcnsiiy 
map of the region identified using the first set of markers which can he used to conduct association studies, as 
explained below. This very high density map has markers spaced on average every 2-50kb. 

The biallelic markers were then used in association studies. DNA samples were obtained from individuals 
suffering from prostate cancer and unaffected individuals as described in Example 17. 

Example 17 

Colledinn of DMA Samples from Affected and Non affcctcd Individuals 
Prostate cancer patients were recruited according to clinical inclusion criteria based on pathological or radical 
prostatectomy records. Control cases included in this study were both ethnically- and age-matched to the affected 
cases; they were checked for both the absence of all clinical and biological criteria defining the presence or the risk of 
prostate cancer, and for the absence of related familial prostate cancer cases. Both affected and control individuals 
were all unrelated. 

The two following groups of independent individuals were used in the association studios. The first group, 
comprising indhfiduals suffering from prostate cancer, contained 185 individuals. Of these 185 cases of prostate 
cancer, 47 cases were sporadic and 1 38 cases were familial. The control group contained 1 04 non-disea$cd individuals. 

Haplotype analysis was conducted using additional diseased (total samples: 2811 and control samples (total 
samples: 130), from individuals recruited according to similar criteria. 

DMA was extracted from peripheral venous blood of all individuals as described In Example E 

The frequencies of the biallelic markers in each population were determined as described in Example 18, 

Example 18 
Genotvpino Affected and Control Individuals 

Genotyping was performed using the following microsequendng procedure. 



wo 99/04038 PCT/IB98/01193 

-54- 

Amplificalion was performBd on each DMA sample using primers designed as previously explained. The pairs of primers 
were used to gcrjorate amplicons Iwrbortng Ihe biallelic markers 99-123, 4-2B, 4-14, 4-77, 99-217, 4<B7, 99-213. 99* 
221, 99-135, 99-1482, 4-73, and 4-65 using the protocols described in Exampia 6 above. 

Microscquoncing primers were designed for each of the biallelic markers, as previously described. 
After purification of the amplification products, the microscqucncing reaction mixture was prepared by adding, in a 20/;l 
linal volume: 10 pmol mrcroscqucncing ofigoruicleotide, 1 U Thermoscquenase (Amorsham E79000G), 1.25 //I 
Thermosequanase buffer (260 mM Tris IICI pH 9.5, C5 mM MgCb), and the two appropriate fluorcscen* ddNTPs (Perkin 
Elmer, Dye Terminator Set 401095) complementary to the nucleotides at the polymorphic site of each biallelic marker 
tested, following the manufacturer's rBConvnendations. After 4 minutes at 94*'C, 20 PCR cycles of 15 sec at 55*^0, 5 
sec at •72'*C, and 10 sec at 94*^0 were carried out in ' a Tetrad PTC-225 thermocycler (MJ Research). The 
unincorporated dye terminators were thon removed by cthano! precipitation. Somples were finally rcsuspcndcd in 
formamide-EDTA loading buffer and heated for 2 min at 95"C before being loaded on a polyacrylamida sequencing gel. 
The data were coflected by an ABI PRISM 377 ONA sequencer and processed using the GENESCAN software (Perkin 
Elmer). 

Following gel analysis, data were automatically processed with software that allows the determination of the 
alleles of biallelic markers present in each amplified fragment. 

The software evaluates such factors as whether the intensities of the signals resulting from the above 
microscquencing procedures are weak, normal or saturated, or whether the signals arc ambiguous. In addition, (he 
software identifies significant peaks (according to shape and height criteria). Among the signilicant peaks, peaks 
corresponding to the targeted site are identified based on their position. When two significant peaks are detected for 
the same position, each sample is categorized as homozygous or heterozygous based on the height ratio. 

Association analyzes were then performed using the biallelic markers as described below. 

Example 19 
Association Analysis 

Association studies were run in two successive steps. In a first step, a rough locaTrzation of the candidate 
geno was achieved by dtstcrmining the frequencies of the biallelic markers of Figure 9 in the affected and unaffected 
populations. The results of this rough localoEation are shown in Figure 10. This analysis Indicated that a gene 
responsible for prostate cancer was located near the biallelic marker designated 4-67. 

In a second phase of the analysis, the position of the gene responsible for prostate cancer was further refined using the 
very high density set of markers including the 99-123. 4-26, 4-14, 4-77, 99-217, 4-B7, 99-213, 99-221, 99-135. 99- 
1482,4-73, and 4-65 markers. 

As shown in Figure 11, the second phase of the analysis confirmed that the gene responsible for prostate 
cancer was near the biallelic marker designated 4-67, most probably within a ca, 150kb region comprising the marker. 

A haplotype analysis was also performed as described in Example 20. 
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Exampfg 20 
Haplotvpe analysis 

The allelic frequencies ot each of the alleles of hiallclic fnarksrs 99-123, 4-26, 4-14, 4-77, 99-217, 4 67, 99- 
213, 99-221, and 99-135 were delermincd in the affected and unaffected populations. Tabic 4 lists the internal 
identification nunibors of the markers used in the haplotype analysis, the alleles of each marker, the most frequent allele 
in both unaffected individuals and individuals suffering from prostate cancer, the least frequent allele in both unaffected 
individuals and individuals suflerino from prostata cancer, and the frequencies of the least frequent alleles in «ach 
population. 

Tab!9 4 

FraquuncY of least frequent allele * * 



15 



20 



Markers 


Polymorphic baia * 


Cases 


Controls 


99-123 


zn 


0.35 


0.3 


4-26 


A/G 


0.39 


0.45 


4-14 


C/T 


0.35 


0.41 


4-77 


C/G 


0.33 


0.24 


99-217 


CfT 


0.31 


0.23 


4-67 


C/T 


0.26 


0.16 


99-213 


T/C 


0.45 


0.3B 


99-221 


CfA 


0.43 


0.43 


S9135 


AfG 


0.25 


0.3 



most frequent allcle/least frequent allele 
standard deviations - a023 to 0.03 1 for controU 
•0.018 to 0.021 for cases 



Among afl the theoretical potential different haplotypes based on 2 to 9 markers, 11 haplotypcs showing a 
strong association with prostate cancer were selected, Tha results of these haplotype analyzes are shown in FiQure 1 2, 

Figures 11, and 12 aggregate association analysis results with sequencing results - generated following the 
procedures further described in Example 21 - which permitted the physical order and/or the distance betwean markers to 
be estimated. 

Tha significance of the values obtained in Figure 12 are underscored by the following results of computer 
simulations. For the computer simulations, the data from the affected mdividuals and the unaffected controls were 
pooled end randomly allocated to two groups which contained the same number of individuals as the affected and 
unaffected gmups used to compfle the data summarized in Figure IZ A haplotype analysis wos run on these artificial 
groups for the six markers included in haplotype 5 of Figure IZ This experiment was reiterated 100 times and the 
results are shown in Rgqrc 13, Among 100 iterations, only 5% of the obtained haplotypes are present with 3 p-value 
less significant than hW as compared to the p-value of 9^07 for haplotype 5 of Figure 12. Furthermore, for haplotype 
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5 of Rgure 12, only 6% of the obtained haplotypes have a stgnificanca level bolow 5^-03, while none of them show a 
significance level below 5^-03. 

Thus, using the data of figure 13 end evaluating the associations for single marker aOeios or for haplotypes 
will pcmnit estimation of the risk a corresponding corrier has to develop prostate cancer. It will be approciatcd that 
5 significanca thresholds of relative risks will be more finely assessed according to the popolaUon tested. 

Diagnostic techniques for determinino an individual's risk of developing prostate cancer may be implemented as 
described below for the markers in the mops of the present invention, including the 99-123, 4-14, 4-77, 99-21 7, 
4-67, 99-213, 99-221. and 99-135 markers. 

The above haplotypc analysis indicated that 171kb o( genomic DNA between biallchc markers 4-14 and 99- 
10 221 totaHy or partially contains a gene responsible for prostate cancer. Therefore, the protein coding sequences lying 
within this region were characterized to locote the gene associated with prostata cancer. This analysis, described in 
further detail below, revealed a single protein coding scqtwncc in the 171 kb genomic region, which was dcsignatod as 
the PG1 gene. 

Exampfe 21 

15 hlKniification of the Genomic Sonucnce in the Candidate neoion 

Template DNA for sequeruing the PGl gene was obtained as follows. BACj E and F from Fig. 9 were subcloned 
as previously described. Piosmid inserts were Hrst amptficd by PCR on PE 9600 thennocyclcrs (Perkin-Elmer), using 
oppropriata primers, AmpIiTaqGold (Perkin Etmer), dNTPs (Boehringer), buffer and cycbng conditions as recommended by the 
Perkin-ElmcrCorporatioa 

PCR products were timn sequenced using automatic ABI Prism 377 sequencers (Perkin Elmer, Applied Biosystems 
Division, Foster City, CA). Sequencing reactions were pcrfomicd using PE 9600 thermocyclcrs (Perkin Elmer) with standard 
dye-primer chemistry and ThennoScquenase (Amersham Ufa Science). The primers wera labeled with the JOE, FAM, ROX 
and TAMRA tfyes. The dPJTPs end ddNTPs used in the sequencing reactions were purchased from Boehringer. Sequencing 
buffer, reagent conccntreiions and cycling conditions were as recommended by Amersham. 
2^ Following the sequencing reaction, the samples were preciphated with EtOH, rcsuspended in formamide loading 

buffer, and loaded on a standard 4% aaylamida get Electrophoresis was perfomfied for 15 hours at 300(JV on an ABI 377 
sequencer, and the sequence data were collected and analyzed using the ABI Prism DNA Sequencing Analysis Software, 
version 2.1.1 

The sequence data obtained as described above were transferred to a propriety database, whoro quality control 
30 and validation steps were performed. A proprietary base-calter flagged suspect peaks, taking into account the shape of the 
peaks, the inter-peak resolution, and the noise level. The proprietary base-caller also performed an automatic trimming. Any 
stretch of 25 or fewer basas having more than 4 suspect peaks was considered unrefiabie and was discardol 

The sequence fragments from BAC subclones isolated as described above were assembled using Gap4 
software from R. Staden (Bonfield et al 1995). This software allows the reconstruction of a single sequence from 
35 sequence fragments. The sequence deduced from the alignment of different fragments is called the consensus 
sequence. Directed sequencing techniques (primer walking) were used to complete sequences and link contigs. 
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Potential functional sequences were then identifiod as described in Example 22. 

Example 22 
Identification of FimctinnnI Seoucnces 

Potential cxons in BAC-derivcd human genomic sequences were located by homology searches on protein, nucleic 
acid nnd EST (Expressed Sequence Tags) public databases. Main public databases were locally reconstnictcd as mentioned 
in Example 14. The protein database, NHPU (Non-redundant Protein Unique) is fom^od liy ;i non-redundant fusion of the 
Genpepl (Benson et aL, Nvddc Acifs Res. 24:1-5 (1996). the disclosure of which is incorporated horein by rclcrcncc), 
Swissprot (Bairoch, A. and Apweiicr. Nuddc Acids Res, 24:21-25 (1996), the disclosure of which is incorporated herein 
by raforoncc) and PIH/NBRF (George et aL, Nucleh Acids Rbl 24:17-20 (199$), the disclosure of which is incorporated 
herein by reference) databases, fiedundani data were eliminated by using the NRDB software (Benson et al. (1996), su^na] 
and intcnwl repeats were masked with the XNU software (Benson et oL, svpnl Homologies found using the NRPU 
database allowed the identification of sequences corresponding to potential coding exons related to known proteins. 

TTie EST local database is composed by the gbest section {1-9) of GciiBank (Benson et al. (1996), supral and thus 
contains all publicly availablo transcript fxaBmcnts. HomoloQies found with this database aflowed the localization of 
potentially transcribed regions. 

The local nucleic acid database contained all sections of GcnBank and EMBL (Rodriguez-Tonic cl ah, Nucleic Acids 
Res, 24:G-12 11996), the disclosure of which is incorporated herein by reference) except the EST sections. Redundant data 
were eliminated as previously described. 

Similarity searches in protein or nucleic acid databases were perlomied using the BLAST software (Alischul et al., 
J. MoL DioL 215:403410 (1990), the disclosure of which is incorporated herein by reference). Alignments were refined 
using the Fasta software^ and multiple alignmenu used Ctustat W. Homology thresholds were adjusted for each analysis 
based on the length and the complexity of the tested rcgioa as well as on the size of the reference database. 

Potential exon sequences identified as above were used as probes to screen cDNA libraries. Extremities of positive 
clones were sequenced and the sequence stretches were positioned on the genomic sequence detenmined above. Primers 
were then designed using the results from these alignments in order to enable the cloning of cDNAs derived from the gene 
associated whh prostate cancer that was identified using the above procedures. 

The obtained cONA molecules were then sequenced and results of Northern blot analysis of prostate mRNAs 
supported the existence of a major cDNA having a 5-6kb length. The stmcture of the gene associated with prostate cancer 
was evaluated as described in Example 23. 

Example 23 
Analysis of Gene Structure 

The intronlexon structure of the gene was finally completely deduced by aligning the mRNA sequence from the 
cDNA obtained as described above and the genomic ONA sequence obtained as described above. This alignment 
permitted the determination of the positions of the introns and exons, the positions of the start and end nucleotides 
defining each of the at least 8 exons, the locations and phases of the 5' and 3* splice sites, the position of the stop 
codon, and the position of the polyadenyiation site to be determined in the genomic sequence. This analysis also yielded 
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the positions of the coding region in the mRNA, and the locations of the polyadenYlation signul and poIyA stretch in the 
mRNA. 

The gens identified as described above comprises at least 8 oxons and spans more than 52kb. A G/C rich 
putative promoter region was identified upstream of tlic coding sequence. A CCAAT m the putative promoter was also 
idonlifiod. TTio promoter region was iOontifiod as described in Prestridga, D.S., Predicting Put II Promoter Sequences 
Using Transcription Factor Binding Sites, / Mot. BioL 249:923-932 (1995), the disclosure uf which is incorporated 
heroin by reference. 

Additional analysis using conventional techniques, such as a 5'flACE icoction using the Marathon-Ready 
human prostate cONA kit from Ctontech (Catalog. No. PTIISG-Il, may be performed to confirm that tha 5' of the cDNA 
obtained above is the authentic 5' end in the mRNA. 

Alternatively, the S'scquencc of the transcript can be determined by conducting a PCR amplification with a 
series of primers extending from the 5'end of the identified coding region. 

The above methods were also used to identify bialielic markers in a gene which was an attractive candidate for 
a gene associated with asthma. Examples 24-31 show how the use of methods of the present invention allowed this 
gene to be identified os a gene responsible, at least partially, tor asthma in the studied populations. Additional details of 
ttie identification of the g&ne associated with asthma are provicted in U.S. Provisional Application Serial Nos. 
60/081,893 (BensBt.026PR) and U.S. Provisional Patent Application Gcnset.026Pfl2, the disclosures of which are 
incorporated herein by reference. 

Example 24 

Detection of hiallefic markers in the candldatB gene: ONA extraction 
Donors were unrelated and healthy. They presented a sufficient diversity for being representative of a French 
heterogeneous papulation. The DNA from 100 individuals was extracted and tested for the detection of the bialielic 
markers 

30 m! of peripheral venous blood were taken from each donor in the presence of EDTA. Cells (peilet) were 
collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed by a lysis solution (50 ml final volume : 
10 mM Tris pH7,6; 5 mM MgCI2; 10 mM NaCD. The solution was cemrifuged (10 minutes, 2000 rpm) as many times as 
necewary to eBminate the residual red tells present in the supernatant, after resuspension of the pellet in the lysis 
solution. 

The pellet of white cells was fysad overnight at 42*C with 3.7 ml off lysis solution composed of; 
- 3 ml TE 10-2 (TrisHCI 10 mM, EDTA 2 mM) I NaCl 0.4 M 
-200/ilSDStD% 

' 500 fA K-protcinase (2 mg K-proteinase in TE 1 0-2 ( NaCI 0.4 M). 

For the extraction of proteins, 1 ml saturated NaCI (SMI {113.5 v/v) was added. After vigorous agitation, the 
solution was centrifuged for 20 minutes at 10000 rpm. 

For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous supernatant, and the solution 
was centrifuged for 30 minutes at 2000 rpm. The DNA solution was rinsed three times with 70% ethanol to eliminate 
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salts, and cenlrifuQBd far 20 minutes at 2000 rpm. The peilsl was dried at 37°C, and resuspBnd9d in 1 ml TE 10-1 or 1 
ml water. Ths DMA concentration was ovaluatod by measuring the OD at 260 run (1 unit 00 - 50 //g/ml DMA). 

To determine the presence of proteins in the DNA solution, tha 00 260 / OD 280 ratio was determined. Only 
DMA preparations having a 00 260 i 00 280 ratio botwccn 1.8 and 2 were used in the subsequent examples doscribod 
below. 

The pool was constituted by mixing equivalent quantities of ONA from each individual. 

Examolfl 25 

Pntectinn nf thfi tiiallelic ninrkers: ampliftcnlion of ncnomic DWA by PCR 
Ths amplification of specific genomic sequences of the ONA samples of Example 24 was carried out on the 
pool of ONA obtained previously. In addition. 50 individual samples were similarly dmpIifiDd. 

PCR assays were performed using the following protocol: 



Final volume 25 //I 

DMA Inqlfjl 

MgCI2 2mM 

dm? leach) 200 ;/M 

primor(each) 2.9no//;l 

Ampli Taq Gold DMA polymerase 0.05 unit//;! 

PCR buffer (lOx - 0.1 M TrisHCI pH8.3 0.5M KCI) 1x 



Pairs of first primers wera designed to amplify the promoter region, exons. and 3' end of the candidate asthma- 
associated gene using the sequence information of iho candidate gene and the OSP software (Hillior 8i Green, 1991). 
These first primers were about 20 nucleotides in length and contained a common oligonucleotide tail upstream of the 
specific bases targeted for amplification which was useful for sequencing. The synthesis of these primers was 
performed following the phosphoramidile metfiod, on a GENSET UFPS 24,1 synthesizer* 

DNA amplification was performed on a Genius II thermocyclor. After heating at g4''C for 10 min, 40 cycles 
were performed. Each cycle comprised: 30 sec at 94°C, 55'C for 1 min, and 30 sec at 72''C. For final elongation, 7 min 
at 72**C ended the amplification. The quantities of the amplification products obtained were determined on SB-well 
microtiter plates, using a fluorometer and Picogreen as intercalant agent (Molecular Probes). 

Example 26 

Detection of the biailelic markers: sequencing of amolifiod genomic DNA and identification of polymnrphisms 
The sequencing of the amplified DNA obtained in Example 25 was carried out on ABI 377 sequencers. The 
sequences of the amplification products were determined using automated dideoxy terminator sequencing reactions with 
a dye terminator cycle sequendng protocol Tha products of the sequencing reactions were run on sequencing gels and 
the sequences were analyzed as formerly described. 
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The sequence data were further evaluated using the above mentioned polymorphism analysis software 
designed to detect the prflsenca of biallelic markors among the pooled amplified fragments. The polymorphism search 
was based on the presance of superimposed peaks in the electrophoresis pattern resulting from diffcrunt bases occurring 
at the same position as described previously. 

Six fragments of amplification were analyzed. In these segments, 8 biallolic markers were detected. Tiic 
locoliiation of the biallclic markors, the polymorphic bases of each allele, and ths frequencies of the most frequent 
alleles was as shown in Table 5. 



Tofalo 5 



Ainplican 


MarkerName 


Origin of ONA 


Localization In 
gone 


Polymorphism 


Froquoiicy 


1 


204/326 


Inil. 


Promoter 


A/6 


S6.2(G) 


2 


32i357 


Pool 


inuon 1 


A/C 


67.7(0 


3 


33/175 


Ind. 


Exon2 


C/T 


97.3(0 


3 


33/234 


Pool 


InUon 2 


A/C 


56.7 (C) 


3 


331327 


Ind. 


Iniion 2 


C/T 


75.3 m 


5 


35/358 


Pool 


Intron 4 


C/G 


67.9 IG) 


5 


35/390 


Ind. 


Intron 4 


C/T 


82(C) 


6 


36/164 


Ind. 


ExonS 


A/G 


99.5 (G) 



Allelic frequencies were determined in a population of random blood donors from French Caucasian origin. Their wide 
range is due to the fact that besides screening a pool of 100 individuals to generate hiallelic markers as described 
above, potymorphtsm searches were also conducted in an individual testing format for 50 samples. This strategy was 
chosan here to provide a potential shortcut towards the identification of putative causal mutations in the association 
studies using them. As the 36/1 64 hiallelic marker was found in only one individual this marker was not considered in 
the association studies. 

The fourth fragment of amplification carrying exon 3 (not shown in the Table) was not polymorphic in the 
tested samples (1 pool + 50 individuals). 
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Example 27 

Vnlirfation of the polvmorphfsms (hrounh microsenuencinn 
The bidllslic markers identilicd in Exampla 26 were further confirmed and tlicir respective frequencies were 
detcrmincil through microsoquencing. Microscquencing was canisd out for each individual DNA sample described in 
Example 24. 

Amplification from genomic ONA of individuals was performed by PCR as duseribud above fm tlie detection of 
the biaUelic markers with the same set of PCR primors described above. 

The prefcrrod primers used in microscquencing iiad about 19 nuclootides in length and hybridized just upstrc<nm 
of the considered polymorpliic baso. 

Fiva primers hybridized with the non-coding suand of the none. For the biall[*lic markers 2134/326, 35/358 and 36/164, 
primers hybridized with tho coding strand of ttic gene. 

The microscquencing reaction was performed as described in Example 1 8. 

Example 2B 

Association study between asthma and the biaUelic markers of Urn candidate gene: colloction of DNA samples from 

affected and nnn aff ectcd individuals 
The asthmatic population used to perform association studies in order to establish whether the candidate gene 
was an asthma-causing gene consisted of 298 individuals. More than 90 % of these 298 asthmatic individuals had a 
Caucasian ethnic background. 

The control population consistod of 373 unaffected individuals, among which 279 French (at least 70 % were 
of Caucasian origin) and 94 American (at least 90 % wore of Caucasian origin). 

DNA samples were obtained from asthmatic and non-asthmatic individuals as described above. 

Example 29 

Association study between astfima and the biallelic markers nf the candidate oene: qenoivpinn of nf fcrlcd and ctmlrol 

individuals 

The general strategy to perform the association studies was to individually scan the DNA samples from all 
individuals in each of the populations described above in order to establish the allele frequencies of the above described 
biaMc markers in each of these populations. 

Allelic frequencies of the above-described biallelic markers in each population were dotermlned by performing 
microsequencing reactions on amplified fragments obtained by genomic PCR performed on tho DNA samples from each 
individual. Genomic PCR and microsoquendng were porformsd as detailed above in Examples 25 and 27 using the 
described amplification and microsequoncing primers. 

Example 3D 

AssQciation study between asthma and the biallelic markers of the candidate nene 
Table 6 shows the results of the association study between five biallelic markers in the candidate gene and 

asthma. 
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Table 6 

AllGiic frequoncios {%) 



Markers 


Asthmatics 
298 individuals 


Controls 
373 individuals 


Frequency diK. 


P value 


32/357 


A 38.6 


A2a.B 


8.8 


7.34x10"* 


33/234 


A 49 


A 44.3 


4.7 


8.86x10' 


33/327 


T78.5 


T74.6 


3.9 


1.0x10'' 


35/358 


G72.3 


666.0 


5.4 


3.53x10"' 


35/390 


T30.4 


T20.3 


iai 


2.33x10^ 



As shown in Table 6, markers 32/357 and 35/390 presented a strong association with asthma, this association being 
highly significant ( pvalue - 7.34x10-4 for marker 32/357 and 2.33x10-5 for marker 35-390). 

Three markers showed modoratfl association when tested inriepcntlentiy. namely 33/234, 33/327, 35/358. 
'5 it is worth memioning that allElic frequencies for each of the bialleltc markers of Table 6 ware separately 

meastjred within the French conuol population (279 individuals) and the Ameiican cuntrui populalion (04 individuals). 
The differences in allele frequencies between the two populations were between 1 % and 7%, with p-valucs above \V\ 
These data confirmed that the comhined French/ American control population (373 individuals) was homogeneous enough 
to be used as a control population for the present association sludy. 

20 

Examnis 31 

Association studies: HanlotynR frequency analysis 
As already shown, one way of increasing the statistical power of individual markers, is by performing 
haplotype association analysis- A haplotype analysis for association of markers in tlie candidate gone and asthma was 
25 perfoTmed by estimating the frequencies of all possible haplotypes for biallolic markers 32/357, 33/234. 33/327, 35(358 

and 35/390 in the asthmatic and control papulations described in Example 30 (Table 6), and comparing these frequencies 
by means of a chi square statistical test (one degree of freedom), Haptotype estimations were perfarmed by applying the 
Expectation-Maximizalion (EM) algoiilhm (Excoffier L & Slaikin 1995. l\/lol.BioLEvol, 12:921-927), using the EM- 
HAPLO program (Hawley ME, Pakstis AJ & Kidd KK, 1994. Am.J.Phys.Anthropol. 18 : 104). 
30 The results of such haplotype analysis are shown in Table 7. 



35 
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Table 7 



llaplulypu 

5 fruquuncics 



Markers 


321357 


33/234 


331327 


351358 


35r39() 


A!(lhm. 


Controls 


OdJt ratio 


P voluo 


Froquoncy diff. 


B.6 


4.7 






10.1 










f valu9 








3.53x10^ 


2,33JtlO^ 










lloplotYpB 1 


A 








T 


'0,2 


0.11 


2.02 


M7x10 


llaplotypa 2 




A 


r 


G 




0.27 


0.18 


t.GB 


2.81 r 10" 


llaplDtype 3 


A 


A 


T 


G 


T 


0.16 


0.09 


2.22 


3.U5x1U 



A two-marker haplotype covering markers 32/357 and 35/390 (haplotypc 1, AT alleles respectively) presented 
15 a p vaiud of 8.47x10-6, an odds ratio of 2.02 and haplotypc frcquimcles of 0.2 for asthmatic and 0.1 1 for cuiitrul 

populations respectively. 

A three-marker liaplotype covering markers 33/234, 33/327 and 35/358 (haplotypc 2. ATG allales respoctivoly) 
presented a p value of 2.81x104. an odds ratio of 1.68 and haplotypc frequencies of 0.27 lor asthmatic and 0.18 for 
control populations respectively, 

20 A five-marker haplotype covering markers 32/357, 33)234, 33/327, 35/358 and 35/390 (haplotype 3, AATGT 

alleles respectively) presented a p value of 3.95il0-5, an odds ratio of 122 and haplotypc frequencies ol 0.18 for 
asthmatic and Q.Od for control populations respectively. 

Haplotype association analysis thus increased the statistical power of the individual marker association 
studies v\;hen compared to single-marker analysis (from p values between 10'^ and 2X10'^ for the individual markers to p 
25 values between 3X1 0^ and 8X1 0'^ for the three-marker haplotype, haplotype 2). 

The significance of the values obtained far the haplotype association analysis was evaluated by the followinQ 
computer simulation test. The genotype data from the asthmatic and control individuals were pooled and rartdomly 
allocated to two groups which contained the same number of individuals as the trait positive and trait negative groups 
used to produce the data summarizGd in Table 7. A haplotype analysis was then run on these artificial groups for the 
30 three haplotypes presented in Table 7. This experiment was reiterated 1 000 times and the results are shown in Tablo 8. 
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Haplotype 
llaplotype 1 

llaplulypc Z 
(ATCl 
Hapiotype 3 
(AATGT) 



CIn-Squarc 



19.70 



13.49 



16.GG 



•64' 
Tables 
Pormutation Test 
Avcrogc Chi-Squaro 

1,2 

1.2 

1.2 



Maximal Chi-Squaro P value 



11.6 



10.5 



9.3 



l.OxlO'' 



1.0x10^ 



1.0x10"' 



The results in Table 8 show that among 1000 iterations only ]% of the obtained haplolypus has a pvaliic 
comparobte to the one obtained in Tablo 7. 

These results dearly validate the statistical significance of tho haplotypes obtained (haplotypos 1, 2 and 3, 

Table 7). 

While Examples 15-31 iilusUate the use of the maps and markers of the present invention for idantifying a nes 
gene associated with a complex disease within a 2Mb genomic region for establishing that a candidate gene is, at least 
partially, responsible for a diseaso. the maps and markers of the present invention may also be used to identify one or 
more biallelic markers or one or moro goncs associated with other detectable phenoiypes, including drug response, drug 
toxicity, or drug efficacy. The biallelic markers used in such drug response analyses or shown, using tho methods of the 
present invention to be associated with such (rails, may lie within or near genes responsible for or partly responsible for 
a particular disease, for example a disease against which the drug is meant to act, or may lie within genomic regions 
which are not respansihle for or partly responsible for a disease. For example, the genomic region harboring markers 
associated with a particular drug response may carry a drug metabolism gene, or a gene encoding a protein with a role in 
the drug response mechanism. Thus, biallelic markers within or near flsnes known to be involved in drug response, 
toxicity, or efficacy or genes suspected of being involved in drug response, toxicity, or efficacy may be used to identify 
individuals fikely to respond positively or negatively to drug treatment In the context of the present invention, a "positive 
response" to a mEdicament can be defined as comprising a reduction of the symptoms related to the disease or condition 
to be treated. In the context of the present invention, a 'negative response' to a medicament can be defined as 
comprising either a lack of positive response to the medicament which does not lead to a symptom reduction or to a 
side-effect observed following administration of the medicament 

Drug efficacy, response and tolerance/toxicity can be considered as multifaciorial traits involving a genetic 
component in the same way as complex diseases such as Alzheimer's disease, prostate cancer, hypertension or diabetes. 
As such, the Identification of genes involved in drug efficacy and toxicity could be achieved following a positional cloning 
approach, e.g. performing linkage analysis within families in order to obtain the subchromosomal location of the gene(s). 
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However, this type of analysis is actually impiactical in the case of drug responsiveness, due to the IdcIc of availability of 
familial cases. In fact, tha likelihood of havino more than one individual in a particular family being exposed to the same 
drug at the same lime is very low. Therefore, drug efficacy and toxicity can only be analyzed as sporadic traits. 

In order to conduct association studies to analyze the individual response tn n givon drug in groups tif patients 
affected with a disease, up to four groups arc screened to delermino their patterns of biallolic imirkcrs using the 
techniques describod above. Tim four groups arc: 

- Non-diseasod or random controls, 

- Diseased paticnts/drug respnndurs, 

- Diseased patients/drug non-rospondcrs, 
■ Diseased patients/drug side effects. 

In preferred embodiments, the above mentioned groups are recruited according to phenotyping cilteria having 
the characteristics described above, so that the phcnotypcs defining the different groups arc non-Ovorlappiiig, preferahly 
extreme phenotypes. 

In highly preferred embodiments, such phenotyping criteria havo the bimodal distribution dosctibcd ahove. 
The final number and composition of the groups for eacli drug association study is adapted 
to die distribution of the above described phenotypes witJiin tJie studied population. 

After selecting a suitable population, association and haplotypc analyses may be perfarmed as 
described herein to identify one or more hiallelic markers associated with drug response, preferably drug toxicity or drug 
efficacy. Tlie identification of such one or more biailclic markers allows one to conduct diagnostic tests to determine 
whether the administration of a drug to an individual will result in drug rosponse, preferably drug toxicity, or diug 
efficacy. 

Tho methods described above for identifying a gene associated with prostate cancer and bialtelic markers 
indicative of a risk of suffering from asthma may bo utilized to identify genes associated with other dctcctahlc 
phenotypes. in particular, the above methods may be used with any marker or combination of markers included in the 
maps of the present invention, induing the 653 biallBlic markers obtained above (which include the soquenccs of SEQ 
ID Nos. 1-50 and 5M00 or the sequences complementary thereto), the PG1 markers, the asthma-associated markers, 
end tho Apo E markers of SEQ ID Nos. 301-305/307-31 1 or the sequences complementary thereto. As described above, 
the general strategy to perf onn the association studies using the maps and markers of the present invention is to scan 
two groups of individuals (trait positive individuals and trail negative controls) characterized by a well defined phonotypa 
in order to measure the allele frequencies of the biallelic markers in each of these groups. Preferably, tho f roquoncics of 
markers with inter-markor spacing of about 150 kb are determined in each groups. iWore preferably, the frequencies of 
markers with inter-marker spacing of about 75 kb are daterminod in each group. Even more preferably, markers with 
inter-marker spacing of about 50 kb, about 37.5kb, about 30kb, or about 25kb will be tested in each population. For 
ganoma-wide studies, it will he preferred to measure the frequencies of about 20,000, or about 40,000 biallelic markers 
in each group. In a highly preferred emhodiment, the frequencies of about 60,OOD, about 80,000, about 100,000, or 
about 120.000 biailettc markers are determined in each group. In some embodiments, haolotype analyses mav be nm 
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using groups of markers located within rsgions spanning less than Ikb, from 1 to 5kb. from 5 to IQkb, from 10 to 25kb. 
from 25 to 5Dkb, from 50 to 150kb. from 1 50 to 250kb, from 250 to 500kb. from 500kb to 1 Mb, or more than 1 Mb. 

Allele frequency can be measured using microsequencing techniques described herein: preferred high 
throughput microsoquoncino procedures arc further exemplified below: it will be furtlier appreciated that any oihor largo 
scalo gonotyping metliod suitable with tiie intended purpose contemplated herein may also be used. 

In some crnbadimcnts of the present invention a computor-basod system may support the on-line cooriJinalion 
hulween the identification of biallelic markers and the corresponding analysis of their frequency in tho differenl ytuups. 

It will be appreciated that it is not necessary to use a full liigh density biallelic marker map in order to start a 
gcnomc-wide association study. It is sufficient to generate and us9 a first set of aboul 20,000 markers (one marker per 
BAC, average inter-marker spacing of about 150kb). Maps having higher densities of biallelic markers (two or more 
markers per BAC, average inter-marker spacing of about 75i£b or less) may then be generated by starting first on those 
BACs for which a candidate association has been established at tfic first step. 

In cases when one or more candidate regions have previously been dolinoated, such as cases where a particular 
gene or genomic region is suspected of being associated wilh a trait, local excerpts of biaOelic marker maps havinu 
densities above one marker per 1 50kb may be exploited using BACs harboring said genomic regions, or genes, or portions 
thereof, tn these cases also, successive association studies may be performed using sets of biallelic markers showing 
increasing densities, preferably from about one every 150 kb to about one every 75kb; more preferably, sets of markers 
with inter-marker spacing below about 50kb, below about 37.5kb, below aboul 30kb, most proforably below about 25 
kb, will be used. 

Haplotype analyses may also be conducted using groups of biallelic markers vuithin the candidate region. The 
biallolic markers included in each of these groups may be located within a (jcnomic region spanning loss than Ikb, from 1 
to 5kb, from 5 to lOkb, from 10 to 25kb, from 25 to SOkb, from 50 to ISOkb, from 150 to 250kb, from 250 to 500kb, 
from 5D0kb to 1Mb, or more than 1Mb. It will be appreciated that the ordered DNA fragments containing these groups of 
biallelic markers need not completely cover the genomic regions of thoso lengths but may instead be incomplete conligs 
having one or more gaps therein. As discussed in funher detail below, bialleJic markers may be used in association studies 
and h^Iotype analyses regardless of the completeness of the corresponding physical contig harboring them, provided linkage 
disequlQbrium between the markers can be assessed. 

As described above, if a positive association with a trait such as a disease, or a drug efficacy and/or toxicity, 
is identified using the biallelic markers and maps of the present invention, the maps will provide not only the 
confirmation of the association, but also a shortcut towards the identification of the gene involved in the trait under 
study. As described above, since the markers showing positive association to the trait are in linkage disequilibrium with 
the trait loci, the causal gene wfl! he physically located in the vicinity of these markers. Regions identified through 
association studies using high density maps will on average have a 20 ^ 40 times shorter length than those identified by 
linkage analysis (2 to 20 Mb). 

As described above, once a positive association is confirmed with tho high density biallelic marker maps of the 
oresent invention, BACs from which the most highly associated markers were derived are completely seauenced and the 
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mutations in tho causal gene are searched by applying genomic analysis tools. As dcscribod above, once a region 
harboring a gone associated with a detectable trait has been sequenced and analyzed, the candidate functional regions 
(eg. exons and splice sites, promoters and other regulalory regions) arc scanned for mutations by comparing the 
sequences of a solcclcd number of controls and cases, using adequate software. 
5 (n some cmbotliments. trail pesilive samples being compared to idonlify causal mutations arc selected amoiiy 

those carrying the ancestral haplotype; in these cmbodimonls, conlrol samples are chosen from individuals not carrying 
said ancestral haplotype. 

In further embodiments, (rait positive samples being compared to identify causal muintions are setected among 
those showing haploiypes that ere as close as possible to the ancestral haplotype; in these embodiments, control 

10 samples arc chosen from individuals not carrying any of the liaplotypes selected for the case population. 

Die mutation detection procedure is essentially similar to that used for biallelic site idenlilication. A pair of 
oligonucleotide primers are designed in order to amplify the sequonces to be tested. In preferred ombodimonts, priority is 
given to the testing of functional sequences; in such embodiments, sequences covering every exon/promotur predicted 
region, preferably including potential splice sites, are determined and compared between the T+ and T populations, 

15 Amplification is carried out on DNA samples from T+ and T- individuals using the polymerase chain reaction undor the 

above described conditions. To be sequenced, amplification products from genomic PGR may be subjected to automated 
dideoxy terminator sequencing reactions and eiectrophorcsed on ABI 377 sequencers. Following gel image analysis and 
DNA sequence extraction, ABI sequence data arc automatically analyzed to detect the presence of sequence variations 
amang T+ and T- individuals. Sequences are preferably verified by comparing the sequences of both DNA strands of 

20 each individual 

It is preferred that candidate polymorphisms be then verified by screening a larger population of cases and 
controls by means of any gcnotyping procedure such as those described herein, preferably using a micrescqucncing 
technique in an individual test format. Polymorphisms are considered as candidate mutations when present in cases and 
controls at frequencies compatible with the expected association results, 

The maps and biallelic markers of the present invention may also be used to identify patterns of biailalic 
markers associated with detectable traits resulUng from polygenic interactions. The analysis of genetic interaction 
between alleles at unlinked loci requires individual genolyping using the techniques described herein. The analysis of 
allelic interaction among a selected set of biaflelic markers with appropriate p-values can he considered as a haplotype 
analysis, similar to thoso described in further details within the present invention. 
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Use of Biallelic Markers to Irigntifv Individuals Likely to Exhibit a Detectable 
Trait Associated with a Particular Allele of a Known Gonn 
In addition to their utifity in searches for genes associated with detectable traits on a genome-wide, chromosome, 
wide, or subchromosamal level, the maps and biallelic markers of tho present invention may be used in more targeted 
approaches for identifying individuals likely to exhibit a particular detectable trait or individuals who exhibit a particular 
detectable trait as a consequence of possessing a particular allele of s gene associated with the detectable trait. For 
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example. the bialielic markors and maps af the present invention may be used to identify individuals who carry an allele of a 
known gens thai is suspected of being associated with a particular detectable trait. In particular, the tafget genes may be 
genes having alleles which predispose an individual to suffer from a specific disease slate. In other cases, the targot genes 
may be Qcnes having alleles that predispose an individual to exhibit a desired or undcsircd response to a drug or other 
pharmaceutical composition, a food, or any odministorcd cumpound. The known gene may encode any of a variety of typos 
of biomolccules. For example, the known gonos laroetcd m such analyzes may bo genes known to be involved in a |iailii:ular 
step in a motabolt pathway in wliicli disruiitions may cause a detectable tralL Alternatively, the target flcnes may be ijcnes 
encodino receptors or Ogands which bind to receptors in which disruptions may cause a detectable trail, genes encoding 
tronsponers. genes encoding proteins willi signaling activities, genes encoding proteins hivolved in the immune rosjniiisi;, 
genes encoding proteins involved in ficmatopoesis, or' genes encoding proteins involved in wound healing. It will be 
appreciated that the target genes are not limited to those specifically enumerated above, but may be any gone known to 
be Of suspected of being associated with a delectable trait. 

As previously mentioned, the mops and markers of tJie present invention may be used to identify genes 
associated with drug response. Accordingly, the present invention comprises a method of using a drug cumpristny 
obtaining a nucleic acid sample from an individual detemtining the identity of the polymorphic base of one or more 
bialielic markers obtained by tho methods doscri*bed above which is or are associated with a positive response to 
ucatmcnt with the drug or one or more biaileBc markers obtained by the methods described above which is or are 
assodated with a negative response to traatmant with the drug, and administering the drug to the individual if the 
nucleic acid sample contains one or more alleles of bialielic markers associated with a positive response to treatment 
with tho drug or if said nucleic add sample lacks one or more alleles of bialielic markers assodatod with a negative 
response to the drug. In some embodiments of the method, the administefing step comprises administering the drug to 
the individual if the nucleic add sample contains one or more alleles of bialldic markers assodated with a positive 
response to treatment with the drug and the nucleic acid sample lacks one or more alleles of bialtelic markers assodated 
with a negative response to the drug. 

The bialielic markers of the present invention may also be used to select individuals for inclusion in 
the clinical trials of a drug. By selecting individuals who are likely to respond favorably to a drug for inclusion in the 
trial the effectiveness of the drug can be assessed without lowering the measured effectiveness as a result of including 
non-responders or negative responders in the clinical trial May be more importantly, using such selection may avoid 
induding patients who may sulfer from undesirable side effects if administered the drug under trial thus increasing the 
safety of dinical trials. Accordingly, the present invention also indudes a method of selecting an individual for inclusion 
in a dinical trial of a drug comprising obtaining a nucleic acid sample from an individual determining the identity of the 
polymorphic base of one or more bialielic markers obtained by the methods described abovo which is or are assodated 
with a positive response to treatment with the drug or one or more bialielic markers assodated with a negotive response 
to treatment with tho drug in the nucldc add sample, and including the individual in the dinical trial if the nucleic add 
sample contains one or more alleles of bialldic markers obtained by the methods described above which is or are 
assodated with a positive response to treatment with said drug or if the nudeic add sample lacks one or more alleles of 
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biallelic markers assodatfid with a negative response to the drug. In one embodiment of the method, tho inclusion step 
comprises Including the individual in the clinical trial if the nucleic arid sample contains one or more alleles of bialleiic 
markers associated with a positive response to treatment with the drug and the nucleic acid sample lacks one or more 
alleles of bialleiic markers associated with a negative response to tho drup. 

In panicular embodiments, one or several of the /IpoE linked markers of SEn ID Nos 301-305/307-31 1 or the 
sequences complementary thereto may be used in targeted approaches to identify individuals who arc likely to develop 
Alzhoimcr's disease, or to identify individuals who do suffer from AWiuimer's disease, hi nlhcr cmbodimonts, one or more of 
the markers of SEQ ID Nos. 306 and 312 and one or more of the the ApoE linked markers of SEQ ID Mus 301-305/307-31 1 
or the sequences complemcniarY thereto are genotyped approaches to idcnlily individuals wliii arc likely to ttevelop 
Alzheimer's disease, or to identify individuals who do sulfer from Abheimer's disease. In further einbodimenls, unc or several 
of the PGl linked markers may be tested ui targeted approaches to identify individuals who aro likely to develop prostate 
cancer, or to identify individuals who do suffer from prostate cancer. Finally individuals iikefy to be asthmatic, or asthmatic 
individuals, can be identified usino one or more of the asthma-associated markers lo condxt the procedures of the present 
invention. 

Given the high number of cancer types in which the PGl cliromosomal region is involved, it will be appiocioted that 
«ie PGl markers may be employed to identify individuals at risk of developing cancers other than prostate cancer, or to 
identify individuals suffering from cancers other than prostate cancer. It will be further appreciated that the astlnna 
associated markers may be tested to identify individuals Ekely to exhibit or exhibitino, inllanunatory traits other than the 
asthmatic state (eg. arthritis, or psoriasis, among others). Tic present invention providas adequate methods to establish 
associations between markers, such as those mentkincd above and candidate traits AKpressly contemplated hoicin, thus 
legitimating the corresponding targeted approaches to identify individuals Okeiy to exhibit, or cjliibiting said candidate traits. 

In some embodiments, the 653 biaUelic markers obtained above (which include the sequences of SEO ID Nos. 
1-50 and 51-1 00 or the sequences complementary thereto) may be used in targeted approaches to identify individuals at 
risk of developing a detectable trait for example a complex disease or dGsiied/undesirod drug response, or to identify 
indhriduals exhibiting said trait The present invention provides methods to estabUsh putative associations between any ol 
the bialleiic marker described herein and any detectable traits, including those specifically described herein. 

To use the maps and markeiis of the present invention m further targeted approaches, bialleiic markers which are 
in linkage disequilibrium with any of the above disclosed markers may be identified. In cases where one or more bialkEc 
markers of the presEnl invention have been shown to be associated with a detectable trait, more bialleiic markers in linkage 
disequiTihrium with said associated bialleOc markers may be generated and used to perform targeted approaches aiming at 
identifying individuals exhibiting, or fikely to exhibit said detectable trait according to the methods provided herein. 

Furthertnpre, in cases where a candidate gene is suspected of being associated witli a particular detectable trait or 
suspected of causing tho detectable trait bialleiic markers in linkage disequilibrmm with said candidate gene may be 
identified and used in targeted approaches, such as the approaches utilized above for the asthma-associated gene and the 
Apo E gene. 
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Biallelic markers that are in liJikaoB disequilibrium with markers associated with a detectable trait, or with genes 
associated with a detectable trait or suspected of being so, are identifiod by performing sinylc marker analyzes, hnplotype 
association analyzes, or linkage disequflibrium measurements on samples from trait positive and trait negative individuals as 
described above usino biallelic markers lying in tl>e vicinity of the target marker or gene. In this manner, n single biallelic 
5 marker or a group of biallelic markers may bo identified which indicate lliat on individual is likely to possess (he detectable 

trail or does possess the detectable trait as a consequence of a particular allele of the taruet marker or gurie. 

Nucleic acid samples frimi individuals to be tested for predisposition to a datectablc trail or possession of a 
detectable trait as a consequence of a particular aOele of the target gone may be examined using the diagnostic methods 
described below. 

Diartnnstic Methods 

To use the maps and biallelic markers of the present invention to diagnose whether an individual is predisposed lo 
express a detectable trait or whether llio individual expresses a deioctablo trart as a result of a particular mutation, one or 
more biallelic markers indicative of such a predisposition or causativo mutation arc idonlificd by performing assodation 
studies and baplotype analysis on affected and non-affected individuals as described above. 

Tfie diagnostic techniques of the present invention may employ a variety of methodolaoics lo determina 
whether a test subject has a biallelic marker pattern associated with an increased risk of doveleping a detectable trait or 
whether the individual suffers from a detectable trait as a result of a particular mutation, including methods which 
enable the analysis of individual chromosomes for haplotypino, such as family studies, single speiin DNA analysis or 
somatic hybrids. 

The trait analyzed usmg the presont diagnostics may be any dcicctaLIc trait, including diseases, drug response, 
drug efficacy, or drug toxicity. A 'positive* drug response may refer to a response indicating cither some drug efficacy 
or no drug toxicity. Diagnostics which analyze drug response, drug efficacy, or drug toxicity may he used to determine 
whether an individual should ba treated with a particular drug. Tor eiample. if the diagnostic indicates a likelihood that 
an individual will respond positively to treatment with a particular drug, the drug may be administered to the individual. 
Conversely, If the diagnostic indicates that an individual is Okely lo respond negatively to treatment with a particular 
drug, an aliernath/e course of treatment may be prescribed. A negativo response may be defined as either the absence 
of an efficacious lespanse or the presence of toxic side effects. 

Clinical drug trials represent another application for the maps and markers of the present invention. One or 
more markers indicative of drug response, drug efficacy, or drug toxicity may be identified using the techniques 
described above. Thereafter, potential partidpants in clinical trials of the drug may be screened to identify those 
individuals most likely to respond favorably to the drug and axciudo those likely to experience side effects. In that way, 
the effectiveness of drug treatment may be measured in individuals who respond positively to the drug, without lowering 
the measurement as a result of the inclusion of individuals who are unlikely to respond postively in the study and 
without risking undesirable safety problems. 

In each of the diagnostic methods, a nucleic acid sample is obtained from the test subject and the biallelic 
marker pattern for one or more of the biallelic markers included in the maps of the present invention, includina the B53 
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WallBlIc markers obtained above (which include the sequences of SEQ ID Nos. 1-50 and 51-100 or tiie sequences 
complementary thereto), the asthma associated biallelic markers, the PG1 biallolic markers, and the Apo E biallelic 
markers, includino those of SEQ ID Nos. 301-305/307-311 or the sequences complomentary thoreto. In other 
embodimcnis, the biallelic marker pattern of one or more of the markers of SCO ID Nos. 306 and 312 is dolcrmined in 
addition to dBtBrmininB Urn Liallclic marker pattern of ono or more ef the biallelic markers included in tim maps of the 
present invention, including the 653 biallelic markers obtained above (which include the scquuncos of SEQ ID Nos. 1-511 
and 51-100 or the sequences cotnplementary thereto), the astlm)a associalud biaUelic markers, the PGl liiiilldic 
markers, and the Apo E biallelic markers, including those of SEO ID Nos. 301.305/307-311 or the setimmccs 
complementarY thereto. In some ombodifnents. the biallelic marker pattern is determined by conducting a.i amplification 
reaction to gonoratc ampKcons containino the polymorphic bases of the ono or more biallelic markers In bu oenotyped. 
The identies of the polymorphic bases of the one or more biaHelic markers to be analyzed may be determined using a 
variety of methods, including hybridization assays which spccilically detect amplilication products containino particular 
allGlas of the one or more biallelic markers, and microsequencing reactions which identify the polymorphic bases of the 
one or more biallelic markers to be anloyzed. 

While the following discussion utilizes the 653 biallelic markers obtained above (which include the sequences 
of SEQ ID Nos. 1-50 and 51-100 or the sequences complementary thereto), the astlmia-associatcd biallelic markers, the 
PGl biallebc markers, and the Apo E biallelic markers as examples of the diaflnostics of the present invention, it will be 
appreciated that the same diagnostics may be used in conjunction with any marker or any (jraup of markers included in 
the maps of the present Invention. 

Examples of amplification primers enabling tho amplification, from subjects genomic DNA samples, of DNA 
frafiments that carry each of the markers ol SEQ ID Nos: 1-50 and 51-100 or the sequences complementary thoroto, are 
Oligonucleolides of SEQ 10 NOs: 101-150 and 151-200; pairs of corresponding primers for a given biallelic marker may 
he reconstituted by choosing the adequate upstream oligonucleotide from SEQ ID Nos. 101-150 tooethcr with the 
corresponding downstream oligonucleotide from SEQ ID Nos: 151-200. 

SEQ ID Nos: 1-50 correspond to tho sequence identification number for a first allele of the biallelic markers of 
SEQ ID Nos: 1-50 and 51-100 and SEQ ID Nos: 51-100 correspond to the sequence idontification number for a second 
allele of the biallelic markers of SEQ ID Nos: 1-50 and 51-100. 

SEQ ID Nos: 313-31B corrrespond to sequence identilicaiion numbers of upstream ampfification primers 
that may be used to gonerate amplification products containing the polymorphic bases of the biallelic markers of 
respective SEQ ID Nos: 301-306/307-312. SEQ ID Nos: 319-324 correspond to downstream amplification primors that 
may be used to generate amplification products containing the polymorphic bases of the biallglic markers of respective 
SEQ ID Nos: 301-305/307-312. 

For all markers of SEQ ID Nos: 1-50/51-100 and 301-306/307-312 or the sequences complementary thereto, 
the enclosed listings indicate the position and identity of the polymorphic base in each biallelic marker. Potential 
microsequencing primers are also included in the sequence listing. The sequences of SEQ ID Nos. 201-250 may be used 
in microsequencing procedures such as those described herein to determine the sequence of the oolvmorphic bases of the 
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biallelic markers of SEQ ID Nos. 1-50/5M00. The sequences of SEQ ID Nos. 325-330 or 331-336 may be used in 
microsequencing procedures such as those described herein to determine the sequence of llie polymorphic bases of the 
biallolic markers of SEQ ID Nos. 301-30G/3D7-312. 

All listings indicate tlic inlGrnal identification number corresponding to the biallelic marker Id which the listed sequence 
is related to. 

One aspect of the present invention is a method for detcrmininu wiicther an individual is at risk nf devclopinu 
Alzheimer's Disease or wliether an individual suffers from Al/hcimcr's Disease as a consequence of possessino the Apo E 
€4 site A allclu. The method involves obtaining a nucleic acid sample from tiin individual and determining whoilicr the 
nucleic acid snmple contains one or more markers indicative of a risk of dcvclopino Alzheimer's Disease or one or more 
markers indicative that the individual suffers from Alzlieimcr's Disease as a result of possessing the Apn E g4 situ A 
allele. In one embodiment, the method comprises detcmiining the identity of the polymorpliic base of one or mora 
biallelic markers selected (rem Uie group consisttng of SEQ ID Nos. 301-305/307'312 or the sequences complcmonlary 
thereto in tiie nucleic acid sample. In a further embodiment, the method involves detcmiininQ vyhelhcr the nucleic acid 
sample contains the sequence of SEQ ID No. 306 {the C allele of marker 93-2452/54 containing the Apo E e4 site A 
allele) or the sequence complementary thereto, In a further embodiment tho mDlIiod comprises determining whether the 
nucleic acid samplo contains SEQ ID No. 311 (the T allele of marker 99-365/344) or the sequence complementary 
thereto. In another embodiment, the method comprises determining whether the nucleic add sample contains SEQ ID 
No. 31 1 (the T allele of marker 99-3S5/344) and SEQ ID No. 306 (the C allele of marker 99-2452/54 containing the Apo 
E site A allele) or the sequanca complementary thereto. 

In still a further embodiment, the molhod comprises determining whether the nucleic acid sanple contains SEQ 
ID Mo. 302, 301, 303, and 304 or the sequences complemcnlory thereto. In still a further embodiment, the method 
comprises determining whether the nucleic acid sanple contains SEQ ID Nos. 302, 303, and 304 or the sequences 
complementary thereto. In a further embodiment the method comprises determining whether the nucleic acid sample 
contains SEQ ID No. 31 1 (the T allele of marker 93-365/344) or the sequence complementary thereto. 

In some embodiments, the step of determining the identity of tho polymorphic base of one or more biollclic 
markers salectod from the group consisting of SEQ ID Nos. 301-305 and SEQ ID Nos. 307-311 or the sequences 
complementary thereto in the nucleic acid sample comprises conducting an amplification reaction on said nucleic acid 
sample using one or more of the amplification primers selected from the group consisting of SEQ ID Nos, 313-317 and 
SEQ ID Nos. 319-323 and determining the identity of (he polymorphic base in said one or more biallelic markers. 

In some embodiments, tho identity of the polymorphic base may be dDtermined using one or more of the 
microsaquencing primers listed as SEQ ID Nos, 325-329 or 331-335. In embodiments comprising the step of 
determining whether the nucleic acid sample contains the sequence of SEQ ID No. 300, the method may comprise 
conducting an ampOfication reaction on the nucleic acid sample using the pair of amplification primers consiling of SEQ 
ID Nos. 318 end 324. In some embodiments, the step of detomiininB whether the nucleic acid sample contains the 
sequence of SEQ ID 306 comprises conducting a microsequencing reaction using one of the microsequencing primers 
listed as SEQ ID Nos, 330 or 336. 
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Another aspect of the presont invention relates to a method of determining whether an individuol is ot risk of 
developing a trait or whether an individual expresses a trait as a consequence of possessino a particular trait-causing 
allele. Alternatively, another aspect of the present invention relates to 3 method of dutRrniining whether an individuni is 
at risk of developing a plurality of traits or whether an individual expresses a plurality of traits as a result of possessing 
particular trait-causing alleles. Those moihods involve oiitaining a nucleic acid sample from the individual and 
dctcmiining whether the nucleic acid sample contains one or mure markers indicative of a risk of developing the trait or 
one or more markers indicative that the individual expresses the trait as a result of pussussing a particular tiait-causing 
allele. In one embodimont, tho tiicthods comprise determining the identity of the polymorphic base of one or mure 
biallolic markers in the maps of the present invention, including any of tho 653 biallelic markers obtained above (which 
include the sequences of SEQ ID Nos. 1-50 and 5M00 or the sequences complementary thereto), the asthma" associated 
biallelic markers, the PGl biallctic markers, and the new Apo E biallelic markers. In a further embodiment, the methods 
comprise determining the identities of ths polymorphic bases of at least two, at least three, at least five, at least eight, 
at least 20. at least 100, at least 200, at least 300, at least 400, between 400 and 2.000, between 2,000 and 4,000, 
between 4,000 and 10,000, between 1 0,000 and 20,000 or more than 20,000 of the biallelic markers in the maps of 
the present invention, including any of the 653 biallelic markers obtained above (w/hich include the soquonces of SEQ 111 
Nos. 1-50 and 5M00 or the sequences complementary thereto), the astluna-associated biallelic markers, the PGl 
biallelic markers, and the new Apo E biallelic markers. 

In some embodiments, the step of determining the identity of the polymorphic base of one or more biallelic 
markers in the maps of the present invention, including any of the 653 biaJIclic markers obtained above (vwhich include 
the sequences of SEQ ID Nos. 1-50 and 5M0D or the sequences complemontary thereto), the asthma-associated 
biallelic markers, the PGl biallelic markers, and the new Apo E biallolic markers, comprises conducting an amplification 
reaction on said nucleic acid sample using appropriate amplification primers and determining tho identity of the 
polymorphic base in said one or more biallelic markers. In some embodiments, the identity of the polymorphic base may 
be determined using appropriate microsequencing primers. 

As described hOTin, the diagnostics may be based on a single biallelic marker or a group of biallelic markers. 
Without wishing to be limited to any particular value, it is preferred that the biallelic marker used in single marker 
diagnostics either as a positive basis for further diagnostic tests or as a preliminary starting point for early preventive 
therapy, exhibit a p value in preliminary screening association analyzes of about 1x10'^ or less. More preferably the p 
value is about 1 xlO"* or less. 

Similarly, without wishing to be limited to any particular value for diagnostics based on more than one biallelic 
marker, it is preferred that the haplotype exhibit a p value of 1 x 10^ or less, still more preferably 1 x 10'^ or less and 
most preferably of about 1x10'® or less in a preliminary screening haplotype analysis. These values arc bGlievcd to be 
applicable to any association studies involving single or multiple marker combinations. Significance thresholds may be 
refined according to the metfujds previously described. 

Example 32 describes methods for determining the biallelic marker pattern in a nucleic acid sample. 
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Examole 32 

A nucleic acid sample i$ obtained from an individual to ba tested for susceptibifiiy to a detactabla trait or for a 
detectabte trait caused by a particular nnjtatiDn. The nucleic acid sample may be a RNA sample or a DMA sample. 

A PCR amplification is conducted using primer pairs which generate ampiificatian products containinu the 
pnlymorphic nucleotides of one nr more biallefic markars associated with such a predisposition or causative mutation. 
For Dxampla. the araplificalinn products may contain the pulymorphic bases of one or more of the biallefic nuiikcrs in the 
maps of tb8 present invention, includinn any of the B53 biallelic markers obtaini-d above (which include the sequences ot 
SEO fD Nos. 1-50 and 5M00 or the sequences complementary thereto), the asthma-associatod biallelic markers, ihc 
PG1 biallehc markers, and the Apo E biaflelic markers or biallelic markers in linkage disequilibrium with any of these 
biallelic markers. In some cmhcdiments. the PCR amplication is cunductad usinQ primer pairs which Dunerate 
amplification products containing tlie polymorphic nucleotides of several biallelic markers. For examiilc, in one 
embodiment, amplificalion products containinu the polymorphic bases of one or more biallelic markers in the maps of the 
present invention, inciutfing any of the 653 biallelic markers obtained above (which include Die sequences of SED ID 
Nos. 1-50 and 5M0D or the sequences complementary thereto), the asthma-associated biallelic markers, the PCI 
bianelic markers, and the Apo E biallelic markers, biallefic markers which are in linkage disequilibrium therewith or with a 
causative mutation associated with a detectable phenotype may be generated. In another embodiment, amplification 
products containing the polymorphic bases of live or mora biallelic markers in the maps of tfie present invention, 
including any of the the 653 biallelic markers obtained above (v»t.ich include the sequences of SEQ ID Nos. 1-50 and 51- 
10O or the sequences complementary thereto), the asthma-assaciatcd biallelic markers, the PG1 biallelic markers, and 
20 the Apo E biallelic markers, biallelic markers which are in linkage disequilibrium thorewitli or with a causative mutation 

associated with a detectable phenotype may be generated. In another embodiment, amplilication products containing Ihc 
polymorplBc bases of 20 or more biallelic markers in the maps of the present invention, including any of the 653 biallelic 
markers obtained above (which include the sequences of SEQ ID Nos. 1-50 and 5 MOO or the sequences complementary 
thereto), the asthma-assodated bialleSc markers, the PGl biallelic markers, and the Apo E biallelic markers, biallelic 
markers which are in linkasa disequilibrium therewith or with the causative mutation may be generated. In another 
embodiment, ampfification products containing the polymorphic bases of 1 DO or more biallelic markers in the maps of the 
present invention, including any of the the 653 biallelic markers obtained above (which include the sequences of SEQ ID 
Nos. 1-50 and 5MD0 or the sequences complementary thereto), the asthma-associated biallelic markers, the PGl 
biallelic markers, and the Apo E biallelic markers, biallefic markers which are in linkage disequnibnum therewith or with a 
causative mutation associated with a detectable phenotype may be generated. In another embodiment, amplification 
products containing the polymorphic bases of 200 or more bialleJic markorj in the maps of the present invention, 
including any of the the 653 biaUelic markers obtained above (which include the sequences of SEQ ID Nos. 1 -50 and 51 - 
100 or the sequences complementary ihereto). the asthma-associated biallelic markers, the PGl biallefic markers, and 
the Apo E biallelic markers, biallelic markers which ere in linkage disequflibiium therewith or with a causative mutation 
associated with a detectable phenotype may be generated. In another embodiment, amplification products containing the 
polymorphic bases of 300 or more biallelic markers in the maps of the present invention, includinn anv of the 653 
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biallelic markors cbtained above (which include the sequences of SED ID Nos. 1-50 ond 51-100 or the suqunnccs 
complementary thEre(o), iha asthma-associated biallolic markers, the PG1 biallelic markers, and tiiu Apo E hiallciic 
markers, hiallciic markers which arc in linkage discqunibrium therewith or with Ihe causative mutation may be 
generated. In another embodiment, amplification products containing the polymorphic bases of 400 or more biailGlic 
markers in the maps of tlic present invention, includino any of Ihe the G53 biallelic markers obtainuil above (which 
include the sequences of SHO ID Nos. 1-SO and 51-100 or the sequences complementary tliereto). the asthma-associated 
biallelic markors, the PG1 biaOelic maikcrs. and the Apo E biallelic markers, biaUclic markers wluch are in tinkayu 
disequilibrium therewith or with a causative motaUon associated with a detectable phonutypc may be ucncrated. 

The primers used to generate tlic amplification products may be designed as described licicin. nepresenUitive 
amplification primers for generating amplification products containing the polymorphic bases of the biallelic markers of 
SEQ ID Nos. 1-50 and 51-100 are provided as SEQ ID Nos. 101-150/151.200 in the accompanying Sequence Listing. 
Tlie PCR primers may bo oligonucleotides of 10, 15, 20 or more bases in length which enable the amplification of the 
polymorphic site in tha markers. In soma embodiments, the amplification product produced using these primers may be 
at least 100 bases in length fi.e. about 50 nucleotides on each side of the polymorphic base). In other embodiments, the 
amplification product produced using these primers may bo at least 500 bases in length (i.B. about 250 nucleotides on 
each side of the polymorphic base). In still further embodiments, the amplification product produced using these primers 
may be at least 1000 bases in length {Le. about 500 nucleotides on each side of the polymorphic base). 

Table 9 lists the inlcmal idontification numbers of the 50 localized markers described herein and the Apo E 
markers described herein, tha SED ID Nos. for each of the two alleles ol these bialtolic markers, the SEO 10 Nos. of 
representative upstream and downstream amplification primors which can be used to Qcnerato amplification products 
includinfl the polymorphic bases of these biallelic markers, and the SEQ ID Nos of microsequcncing primers which can be 
used to determine the identias of the polymorphic bases of these markers. 

Table 10 

Marker SEQ ID Nos SEQ ID Nos SEQ ID Nos 

(Gensetcodel Rrst Second Amplification primers Microsequoncing primers 
allele allele Upstream Downstream 1 2 
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99-2269 n 
99-2271 12 
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^9-2647 49 99 149 199 249 299 

39-2B49 50 100 150 200 250 300 

ll will be apprecialDd that the primers listed in Tabic 9 merely oxemplary and that any other sot of primers 
which produce amplification products containing (he polymorphic nuclBotidos of one or mors of the biallelic markers of 
SEQ m Nos: 1-50 and 5M00 or biallclic marVcc^ in linkage diseiiiiilibrium therewith or with a causative mutation far a 
detectable trait or a combination thereof may be used in the diagnostic meihuds. It will also be appreciated tliat timso 
diagnostic methods may be perJomind with any biallelic marker or combination of biallelic markers included in the maps 
of the present invention. 

Following the PCH amplification, the identities of the polymorphic basos of one or more of the biallelic markers 
in the nucleic acid sample are determined. The identities of tiw polymorphic bases may be determined uitni? the 
microsoqunncing procedures described in Eiample 13. It wiJI be appreciated that the microsequenciiig primers listed as 
SEQ ID NOs: 201-250 and 251-300 are merely exemplary and that any primer having a T end near the polymorphic 
nucleotide, and preferably immediately adjacent to the polymDrpliic nucleotide, may be used. Similarly, it will be 
appreciated that microscquencing analysis may be performed for any marker or combination of markers in the maps of 
the present invention. 

Alternatively, the microsequencing analysis may be performed as described in Pastincn ct aL, Genome 
Research 7:606-614 (19971 the disclosure of which is incorporated heroin by reference, and which is described in more 
detail below. 

Alternatively, the PCR product may he completely sequenced to determine the identities of the pulymnrphic 
bases in the biallelic markers. In another method, the identities of the polymorphic bases in the biallclic markers arc 
determined by hybridizing the amplification products to microarrays containing allele specific oliunonuclootides specific 
for the polymorphic bases in the biaHelic markers. The use of microarrays comprising allele specific oOgonucleotidos is 
described in more detail below. 

It will be appreciated that the identities of the polymorphic bases in the biallolic markers may be determined 
using techniques other than those listed above, such as conventional dot blot analyzes. 

Nucloic acids used in the above diagnostic procedures may comprise at least 10 consecutive nucleotides, 
including the polymorphic bases, of the biallelic markers in the maps of the present invention, including any of the B53 
biallelic markers obtained above (which include the sequences of SEQ 10 Nos. 1-50 and 5M00 or the sequences 
compIementarY theretoh the asthma-associated biallelic markers, the PGl biallclic markers, and the new Apo E biallelic 
markers, including those of SEQ ID Nos. 301-305/307-311 or the sequences complementary thereto. Alternatively, the 
nucleic acids used in the above diagnostic procedures may comprise at least 15 consecutive nucleotides, including the 
polymorphic bases, of the biaUelic markers in the maps of the present invention, including any of the 653 biallclic 
markers obtained above (which include the sequences of SEQ ID Nos. 1-50 and 51-100 or the sequences complementary 
thereto), the asthma-associated bialleKc markers, the PGl biallelic markers, and the new Apo E biallelic markers, 
including those of SEO ID Nos. 301.305/307-311 or the sequences complementary thereto. In some embodiments, the 
nucleic acids used in the above diagnostic procedures may comprise at least 20 consecutive nucleotides, including the 
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polymorphic basw. oi the bianelic markers in the maps of tho present invention, including any of the 653 biallelic 
markers obtained above (which include the sequences of SEQ ID Mas. 1-50 and 51-100 or the sequences complomuntary 
Iheretol the asthma-associatod hiadclic markers, the PG1 biallelic markers, and tho new Apo E biallulic markers, 
including those of SEQ ID Nos. 301-305/307.31 1 or the sequences CQinpIcmentary thereto. In still othur cmbodimonts, 
the nucleic acids used in the above diafinostic procedures may comprise at least 30 consecutive nucleotides, includino 
the polymorphic bases, of the biallelic markers in the maps of (he present invunlion, including any of the 653 Linlielic 
markers obtaiiuid above (which include the sequences of SEQ ID Nos. 1-50 and 5M00 or the sequences complementary 
Tfieretol, the asthma-associatod biallelic markers, the PG1 biallelic markers, and tho new Apo E biallolic markers, 
including those of SEQ ID lyos. 301.305/307-31 1 or liie sequences complcmunlary thereto. In further ombodijnents, the 
nucleic acids used in the above diagnostic procedures may comprise mere than 30 consecutive nucleotides, including the 
polymorphic bases, of the biallelic markers in the maps of the present invention, including any of tlic the G53 biallelic 
markers obtained above (which include the sequences of SEQ ID Nos. 1-50 and 51-100 or the sequences complementary 
thereto), the asthma-associated biallelic markers, the PG1 biallolic markers, and the new Apo E biallelic markers, 
including those of SEO ID Nos. 301.305/307-31 1 or the sequences cumplcmontary thereto. In still further crnhodimcms. 
the nucleic acids used in the above diagnostic procedures may comprise the oiuiru sequence of the biallelic markers in 
the maps of the present invcntioa including any of Uie the 653 biallelic markers obtained above (which include the 
sequences of SEQ ID Nos. 1-50 and 51-100 or the sequences complemeniary thereto), the asthma-associated biallelic 
markers, the PG1 biallolic markers, and the new Apo E biallelic markers, including those of SEQ ID Nus. 301-305/307- 
31 1 or the sequences camplementary thereto. In some embodimcnis the nucleic acids used in the diagnostic procedures 
are longer than the sequences of SEQ ID Nos. 1-50, 5M00. 301-305 and 307-11 bocause they contain nucleotides 
adjacent to these sequences. 

The diaonostics of the present invention may also employ nucleic acid arrays attached to DNA chips or any 
other suitable solid support, including beads. As usod herein, the term array means a one dimensional, two dimensional or 
multidimonsional arrangement of a plurality of nucleic adds of sufficient length to permit specific detection of nucleic acids 
capable of hybridizing thereto. 

DNA chips allow the integration of micro-biochemical processes (such as DNA hybridization), systems of signal 
detectiort (such as fluorescence) and data processing into a single system which can be usod to obtain information on 
polymorphism. The solid surface of the chip is often made of silicon or glass but it can be a polymeric membrane. 
Efficient access to polymoiphism informatbn is obtained through a basic structure comprising high-density arrays of 
oligonucleotide probes attached to a solid support (the chip} at selected positions. The immobilization of arrays of DNA 
probes on solid supports has been rendered possible by the development of a technology generally identified as "Very 
Large Scale Immobilized Polymer Synthesis' (VLSIPS™) and in which, typically, probes are immobilized in a high density 
array on a solid surface of a chip. Examples of VLSIPS™ technologies are provided in US Patents 5,143,854 and 
5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 95111995, the disclosures of which are 
incorporated herein by reference, which describe methods for forming oligonucleotide arrays through techniques such as 
light-directed synthesis techniques. 
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In designing strategies aimed at providing arrays 0/ nucleotides immobilizod on jolid supports, funher 
presentation strategies were developed to order ond display the probo arrays on tho cliips in an attempt to maximize 
hybridization paUorns and sequence information. Examples of such presentation strntcgies arc disclosed in PCT 
Publications WO 94/12305. WO 34/11530, WO 97/29212 and WO 97/3125fi, the disclostircs of which arc incorporated 
herein hy reference. 

Each DNA cliip can contain thousands to millions of iiuJividual synthetic DMA probes arranged in a grij likc 
paitorn and miniaturized to the size of a dime. 

Tlie chip technology has been successfully used lo detect mutations in ruinmrous coses. Fur example, the 
screening of mutations has been undertaken in the DRCAl gunc, in 5. csm/slav mutant strains, and in the protease 
geno of HIV-1 virus (see llacia ct a!., UfaL Genet 14:441-447{1996); Shoemaker et al. J\/sL Genet. 14;45045G (1996); 
Kozal ct al., Not Med. 2:753-753 (1996), the disclosures of which aro incorporated herein by reference!. At least three 
companies proposo chips ablo to detect bialleHc polymorphisms: Affymotrix (GeneChipL Hyseq (HyChip and I ly Gnostics), 
and f'rotogenc Laboratories. 

In some embodiments, the efficiency of hybridizalion of nucleic acids in the sample with the probes attached to 
the chip may be improved by using polyacrylamidc gel pads isolated from one another by hydrophobic rogions in which 
tha ONA probes are covolently linked to an acrylamide matrix. 

The polymorphic bases present in the biallelic mgrker or markers of the sample nucleic acids arc determined as 
follows. Probes which contain at least a portion of one or more of the biallelic markers of the present invention are 
synthesized either /W ^/r:/ or by conventional synthesis and immubifized on an appropriate chip using methods known to 
the skilled technician. 

The nucleic add sample which includes the candidate region to be analyzed is isolated, amplified with primers 
capable of generating an amplification product containing the polymorphic bases of one or more biallolic markers, and 
labeled with a reporter group. The reporter group can be a fluorescent group such as phycoerythrin. The labeled nucleic 
acid is then incubatod with the probes immobilized on the chip using a fluidics station. For example, Manz et al. \Avd. in 
Chrometogr. 33:1-66 {ig931i the disclosure of which is incorporated herein by reference) describe the fabrication of 
fluidics devices and panicularly miaocapillary devices, in silicon and glass substrates. 

After the reaction is completed, the chip is inserted into a scanner and patterns of hybridization are delected. 
The hybridization data is collocled as a signal emitted from the reporter groups already incorporated into the nucleic 
acids generated in the amplification of the sample DNA, which is now bound to the probes attached to the chip. Probes 
that perfectly match a sequence of the nucleic acid sample generally produce stronger signals than those that havo 
mismatches. Since the sequence and position of each probe immobilized on the chip is known, the identity of tho nucleic 
acid hybridized to a given probe can be detannined. 

For single-nucleotido polymorphism analyzes, sets of four Dligonucleotides are generally dosigned (one for each 
possible base) that span each position of a portion of the candidate renion found in the nucleic acid sample, differing only 
in the identity of the central base. The relativd intensity of hybridization to each series of probes at a particular location 
allows the identification of the bass corresponding to the central base of the probe. For example, to detect sinqle 
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nucIeotidB polymorphisms such as those in tha present biaJlelic markers, ofioonucleotides having each of the two allelic 
basos at their central position arc affixed to tho chip. The amplification products rosulting from omplincalion of ihB 
nucleic acids in the sample ore hybridized to the chip under high stringency |at lower salt concontralion and higher 
tompcralurc over shorter time periods) to facilitate specific detection of tho polymorphic sequences present in tho 
nucleic acid snmple. 

The USD of direct electric field control improves the daierminalioii of singlo base mutations (Waniioen). A 
positive liQld increases the transport rato of ncyath^cly charged nucleic acids and results in a lO-fold incrcnsu of the 
tiyhridization rates. Using this techninue, single base pair mismalclius arc datoctod in less than 15 sec (sec Sosnowski ct 
al., Proc. NotL Acad. ScL USA 94;11 19-1123 (1997), the disdosure of which is incorporated herein by reference). 

Another technique which can be used tu analyze polymorphisms includes multicompaiient integrated systems 
which miniaturize and companmentaOze processes such as rostriction enzyme digestion, PGR reactions, and capillary 
electrophoresis in a single functional device. An example of such technique is disclosed in US patent 5,588,136. the 
disclosure of which is incorpoiatcd herein by reference, which concerns the integration of PGR amplification and 
capillary electrophoresis in chips. Integrated systems are best applied with micfofluidic systems. These systems 
comprise a pattern of microchanncls designed onto a glass, silicon, tiuanz. or plastic wafer included on a microchip, The 
movements of the samples are controlled by electric forces applied across different areas of tho microchip to create 
functional microscopic valves and pumps with no moving parts. Regulating or varying the voltage controls tiic liquid flow 
at intorsectiDns between the micro-machined channels and changes the liquid flow rote for pumping across different 
SGclions of the microchip. 

In the caso of biallelic marker analyzes, the micro-chip integrates nucleic acid amplification, a microscquuncing 
reaction (such as the one described above), capillary electrophoresis and a detection method such as loser-induced 
fluorescence detection. 

In a first step, the DNA samples are amplified, preferably by PCR. Then, the amplification products are 
subjected to automated microsequencing reactions using ddNTPs (specific fluorescence for each ddNTP) and the 
appropriate oligonucleotide microsequencing primers which hybridize just upstream of the targeted polymorphic base. 
The microsequencing reactions may employ primers capable of being extended to the polymorphic bases of the biallelic 
markers. Preferably, the microsequencing primers comprise a sequence terminating at the base immediately preceding 
the polymorphic base of the biallelic markers. Once the extension at the 3* end is complaxed, tho primers are separated 
from the unincorporated fluorescent ddNTPs by capillary electrophoresis. The separation medium used in capillary 
electrophoresis can for example be polyacrylamide, polyethylenoglycol or dextran. The incorporated ddNTPs in the single- 
nucleotide primer extension products are identified by fluorescence detection. Preferably, the micro-chip can he used to 
process at least 96 samples in parallel. R/lore preferably, the micro-chip can be used to process at least 3S4 samples in 
parallel. Preferably, the microchip is designed for use with detection procedures using four color laser induced 
fluorescence detection of the ddNTPs. 

Any one or more alleles of the biallelic markers in the maps of the present invention, or fragments thereof 



wo 99/04038 



PCT/IB98/01193 



-81- 

conlainmg the polymorphic basos, may be fixed to a solid support such as a microchip or other immobilizing surface. The 
fragmonts of these nucleic adds may comprise at least 10, at least 15, at least 20, at least 25, or mnrc than 25 
consecutive nucleotides of the biallelic markers described hercijL Pteferably, the fragments include the polymorphic bases of 
UicbiallGlic markers. 

A nucioic acid sample is appfiud to the inimoLilizing surface and analyzed to determine the iiionlius uf the 
polymorphic bases of one or more of the biallelic maikors. In some embodiments, the solid support may also include one or 
more of the amplification primers doscribod herein, or fraomcnls comprising at least 10. at least 15, or at least 20 
consecutive nucleotides thereof, for generating an amplilicatian product containing the polymorpliic bases of the biallelic 
markers to be analyzed in the sample. 

Another embodiment of the present invention is a solid support which includes one or more of the micrusuiiuencinu 
primers listod as in the accompying Sequence Listing, or fragments comprising at least 10, at least 15, or at least 20 
consecutive nucleotides thereof and having o 3' temiinus bnmediatcly upstream of the polymorphic base of the 
corresponding hiallelic marker, for determining the identity of the polymorphic base of the one or more biallelic maikors fixed 
to the solid support. 

For example, one embodiment of the present invenlion is an array of nucleic acids fixed to a solid support, sudi as 
a microchip, bead, or other immobilizing surface, comprising one or more of the biallelic markers in the maps of the present 
invention or a fragment comprising at least 10, at least 1 5, at least 20, at least 25, or more than 25 consecutive nucleotides 
thereof including the polymorphic base. For example, the array may comprise one or mare of any of the 853 biallelic 
markers obtained above (which include the sequences of SEQ ID Nos. 1-50 and 51-100), the asthma-associated biallelic 
markers, the PGl biallelic markers, and the new Apo E biallelic markers (including SEQ ID Nos. 301-305/307-31 1} or the 
sequences complementary thereto, or a fragment comprising at feast 10, at least 15, at least 20, at least 25, or more than 
25 consecutive nucleotides thereof including the polymorphic base. In a further embodiment, the array comprises at least 
five of the biaflotic markers in the maps of the present invention or a fragment comprising at least 10, at least 15, at least 
20, at least 25, or more than 25 consecutive nucleotides thereof including the polymorphic base. For eiample, the arrays 
may comprise at least five of any of the 653 biallelic markers obtained above (which include the sequences of SEQ ID 
Nos, V50 and 5M00), the aslhma-assodated bialleOc markers, the PGl biallelic markers, and the now Apo E biallelic 
markers {including the sequences of SEO ID Nos. 301'305/3Q7'311) or the sequences complementary thereto, or a 
fragment comprising at least 10, at least IS, at least 20, at least 25, or more than 25 consecutive nucleotides thereof 
including the polymorphic bass. In a furlhEr embodiment the array comprises at least 10 of the biallelic markers in the 
maps of the present invention or a fragment comprising at least 10, at least 15, at least 20, at least 25, or more than 25 
consecutive nucleotides thereof including the polymorphic base. For eif ample, the array may comprise at least 10 of any of 
the 653 biallelic markers obtained above (which include the sequences of SEQ ID Nos. 1-50 and 5M00), the asthma- 
associated biallelic markers, the PGl biallelic markers, and the new Apo E biallelic markers (including the sequences of 
SEQ ID Nos. 301-305/307-31 1) or the sequences complementary thereto, or a fragment comprising at least ID, at least 15, 
at least 20, at least 25, or more than 25 consecutive nucleotides thereof including the polymorphic base. In a further 
embodiment the an^y comprises at least 20 of the biallelic markers in the maps of the present invention or a fraqment 
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comprisino at toast 15 consecutivs nucleotides thereof including the palymorphic base. For example, the array may comprise 
at least 20 of any of the G53 blallaJic markers obtained above (whicli include tho sequences of SEQ ID Nos. 1-50 and 51- 
100), the asthma-associated biallelic markers, the PG1 biallelic markers, and the now Apo E biallciic markers (including 
the sequencos of SEQ ID Nos. 301-305/307-311) or the sequences complomeritarv thereto, or a Iraflment comprisino at 
5 least 10. at teast 15, at least 20, at least 2S, or more than 25 consecutive nucleotides tliurcof incluriiiiu Uic polymorphic 
base. In a further embodimnnt the array comprises at least 100 of tlio biallelic markers in llie maps of Uie priiscnt 
invention or a fragment comprising at toast 10. at least 15. at least 20. at least 25, or more than 25 consecutive nucleotides 
thereof includino the polymoqjliic base. For example, the array may comprise at toast 100 of any of ihu C53 bialtolic 
markers obtained above jwliicb include the sequences of SEQ ID Nos. 1-50 and 51-1 00). the asthma-associated bialtolic 

10 markBrs, tho PGl biallelic markers, and tiie new Apo E biallelic markers (indudinB tlie sequences of SEQ ID Nus. 301- 
305/307-311) or thesequonces compleincntary thereto, or a fragment comprisino at least 10. at least 15, at least 20, at 
least 25. or mors than 25 consecutive micteotides tlicreof including tlie polymorphic base. In a further embodiment the 
array comprises at least 200 of the biallelic markers in the maps of the present invention or a fragment tiiereof comprising 
at least 10, at toast 15, at least 20. at least 25, or more than 25 consecutive nuctoolides tiiereof including the polymorphic 

15 base. For exampte, the array may comprise at least 200 of any of the G53 biallelic markers obtained above (which include 

the sequences of SEQ ID Nos. 1-50 and 51-100). the asthma-associated bialtolic markers, the PGl biallelic markers, and 
the new Apo E bialtolic markers (includng the sequences of SEQ ID Nos. 301-305/307-311) or the scqNcnces 
tomplamentajY tl'Efeto. or a fragment comprising at least 10, at toast 15, at loast 20, at least 25, or more than 25 
consecutive nucleotides thereof including the polymorphic base. In a further embodiment the array comprises at least 300 

20 of the biallelic markers in the maps of the present invention or a fragment compiising at least 10, at toast 1 5, ot least 20, at 

toast 25. or more than 25 consecutivo nucteotides thereof including the polymorphic base. For example, the array may 
comprise at toast 300 of any of the 653 biaUelic markers obtained above (which include the sequences of SEQ ID Nos. 1- 
50 and 51-100), the asthma-associated bialtolic markers, the PGl bialtolic markers, and the now Apo E biallelic markers 
(including the sequences of SEQ ID Nos. 301-3051307-311) or the sequences complomentary thereto, or a fragment 

25 comprising at least ID, at least 1 5. at toast 20, at least 25, or more than 25 consecutive nucleotides thereof including the 

polymorphic base. In a further emhodimfint the array coraprisBS at least 400 of tha biallelic markers in the maps of the 
present invention or a fragment comprising at toast 10, at toast 15, at toast 20, at least 25, or mora than 25 consecutive 
raicleotides thereof including the polymorphic base. For exampte, the an^ay may comprise at toast 400 of any of the 653 
biallelic markers obtamed above (which include the sequences of SEQ ID Nos. 1-50 and 51-100), the asthma-associated 

3D biallelic markers, the PGl biallelic markers, and the new Apo E bialtolic markers (including the sequences of SEQ ID Nos. 

301-305)307-31 1 ) or the sequences complementary thereto, or a fragitiont comprising at least 1 0, at toast 1 5, at toast 20, 
at least 25, or more than 25 consecuthre nucleotides thereof induding the polymorphic base. In a further embodiment the 
array comprises more than 400 of the bialtolic markers in the maps of the present invention or a fragment comprising at 
toast 10. at least 15. at toast 20, at toast 25, or more than 25 consecutive nudeotides theroof mcluding the polymorphic 

35 base. For example, the array may comprise at toast 400 of any of the 853 biallelic markers obtained above (which include 

the seouences of SEQ 10 Nos. 1-50 end 51-100). the asthma-associated biallelic markers, the PGl biallelic markers, and 
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the new Apo E biallelic markers (including the sequences of SED ID Nos. 301-305/30^311) or the sequences 
complomcntarv thereto, or a fooment comprising at laast 10, ot least 15, at least 20, ot least 25, or moro than 25 
consecutive nucleotides thereof including the polymorphic base. Each of tho embodiments listed above may also include one 
or more of the sequences of SEO ID Nos. 306 and 312 in addition to tliose enumerated above. 
5 Another cmbodimDiu of the present invention is an array comprising ainplilicalion primers for ijeiieratinfl 

amplification products containinfl tiie |Julymorphic bases of one or mere, at least five, at least 10, at least 20, ai least 100. 
at least 200, at least 300, at loast 400, or more than 400 of llie liiaUcIic markers in the maps of the present invention. For 
example, the array may compiisc ampKncation primers for (jcncraiino amplification products cojilainijiij the polymorphic 
bases of one or more, at least five, ot least 10. at Joast 20, at least 100. at least 200, at least 300, at loast 400, or more 
10 than 400 of any of the 653 biallolic markers obtained above (whidi include the sequences of SEQ ID Nus. 1-50 and 51- 

IDO or the sequences complementary thereto), the asthma associated biallolic markers, the PGl biallelic markers, and 
the new Apo E biallelic markers (including the sequences of SEQ ID Nos. 301.305/307.311 or the sequences 
complementary thereto). In such arrays, the ampOficaiion primers included in the array arc capable of aniplifyinu the 
biaUelic marker sequences to be delected in the nucleic acid sample appfied to the array (i.e. the omplification primers 
correspond to the biallelic markers af fired to the array). For example, if the array is designed to detect the biallelic marker of 
SEQ ID Nos. 1 and 51 it may also contain SEQ ID Nos. 101 and 151. the amplilication primers capable of Generating an 
amplicon which includes sequence ID Nos. 1 and 51. Thus, the arrays may include one or mare of the amplification primers 
of SEQ ID Nos. 101-200, 313-317, and 319-323 corresponding to the one or more biallelic markers of SEQ ID Nns. 1-50, 
51-100. 301.305, and 307-311 which are included in the array. In other emljodiments. the arrays may include 
20 amplification primers capable of generating an amplification product which includes the biallelic markers SEQ ID Nos. 

306 and 312 in addition to amplification primers capable of generating an ampCfication product cuntainino each of the 
markers enumerated above. Thus, in such embodiments, the arrays may further include the amplilication primers of SEQ 
ID Nos. 31 Band 324. 

Another embodiment of the present invention is an array which includes microsequancing primers capable of 
25 dotermining the identity of the polymorphic bases one or more, at least five, at least 10. at least 20, at least 100, at least 

200, at least 300, at least 400, or more than 400 of the biallelic markers in the maps of the present invention. For 
example, the anray may comprise microsequencing primers capable of determining the identity of the polymorphic bases of 
one or more, at least five, at least 10, at least 20. at least 100, at least 200, at least 300, at least 400, or mon; than 400 
of the 653 bialleOc markers obtained above (which include the sequences of SEQ ID Nos. 1-50 and 5 MOO or the 
30 sequences complEmentary thereto), the asthma-associated biallelic markers, the PGl bialfelic markers, and the new Apo 

E biallelic markers (including the sequences of SEQ ID Nos, 301-305/307-31 1 or the sequences complcmantary thoroto). 
The sequences of representative microsequencing primers which may be included in the array are listed in the sequence 
listing as SEQ ID Nos. 201-300. 325-329, and 331-335. In other embodiments, the arrays may further include 
microsequencing primers for determining the identity of the polymorphic bases of one or more of tho sequences of SEO 
35 ID Nos. 306 and 31 Z such as the microsequencing primers of SEQ ID Nos. 330 end 336. 

Arrays containing any combination of the above nucleic acids which permits the specific detection or 
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identification Df the polymorphic bases of the biallclic markers in (he maps of tha prosont invention, including ony 
combination of the 653 biallclic markm obtainsd above (which include iho sequences of SEQ 10 Nos. 1-00 and 51-100 
or tho sequences comptementary therntn), the asthma-associatcd biallcRc markers, the PG1 biallslic marVcrs. and the 
new Apo E biallclic markers (includino the sequences of SEQ ID Nos. 301-305/307-31 1 or the sequences comptementary 
thcfclo) are also within the scope of the present invention. Other cniborfiments of the arrays include nucleic acids which 
permit the specific detection or identification of the polymorphic bases of one or more of SEQ 10 Nos. 306 and 312 in 
addition to the nucleic acids permiltinu the specific detection or idcniination of the polymorphic bases of the biollelic 
markers listed in the pn-cedinu sentence. For uxpmple. the array may ciimprise both (he biallclic markers and 
amplification primers capable of generating amplification products caiitaining the polymorphic bases of the biallclic 
markers. Alternatively, the array may comprise both amplification primers capable of oeneratinu amplification priulocis 
containing the polymorphic bases of the biallclic markers and microsequoncin(, primers capable of detominino the 
idontilies of the polymorphic bases of these markers. 

Ahhough the above examples describe arrays comprising specific groups of hialleiic markers and, in some 
embodiments, specific amplification primers and microscquencing primers, it will be appreciated that the prtsent 
invention oncompasscs arrays including any biallefic marker, group of biallclic markers, ampnfication primer, group of 
amplification primers, microsequencing primer, or group of amplification primors described herein, as well as any 
combination of tlie precedino nucleic acids. 

Allernativcly. (he microsequencing procedures described above may be used to determine whether an individual 
possesses a pattern ol biallelic marker alleles associated with a detectable (rait In this approach, a PGR reaction is 
parfomicd on (he ONA or RNA of the individual to be (es(cd to amplify the desired biallclic markers or portions thereof. The 
amplification product is hybridized to one or more oligonucleotides having their 3' end one base from the position of the 
polymorphic basos of the bialleBc markers which are fixed to a surface. The oligonucleotides arc extended one base using a 
detectably labeled dNTP and a polymerase. Incorporation of a pattern of detectably labeled bases indicative of a biallclic 
marker pattern associated vvith a detectable trait indicates that the individual suffers from a detectable trait as the result of 
a particular mutation or tha( the individual is a( risk for developing the detectable trait at a subsequent time. 

In addition to their use in diagnostic techniques such as (hose described above, any of the arrays described above 
may also be used to identify a haplotype U a set of alleles of bialielic markers) which is associated with a particular trait 
As described above, in such analyses nucleic acid samples are obtained from trait positive and trait negathm individuals and 
the alleles of bialielic markers present in each population are determined to identily a haplotype which is statistically 
associated with the trait. The arrays may be employed in haplotype analyses as follows. Nucleic acid samples obtained 
from trait positive and Uait negative individuals are amplified with primors capable of gonorating omplification products 
which include the polymorphic bases of Iho bialielic markers. The amplification products are labeled with a reporter group 
and allowed to contact the bialielic marker probes which are attached to the support As described above, the bialielic 
marker probes to which the labeled amplification products specmcally hybridize are determined to indicate which alleles of 
the biaHelic markers arc present in the samples. The patterns of alleles of bialielic markers in the trait positive and (rait 
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negative individuals are then detemined to identify a haplotype having a statistically significant associatian with the trait. 

Alternatively, as described above, the nucleic acid samples fiom trait positive and (rait nogatrve individuals may be 
applied to an array comprising amplification primers capable of generating ainplificalion products which include the 
polymorphic bases of the biallefic markers. The identhics of (he polymorphic bases in the amplification products are then 
determined using techniques s.xh as the microsequendno procedures disclosed herein. Altcmntively, amplincation can ba 
conducted in liquid phase and microsequancing may be conducted on the array. 

Altomativcly, both amplification and miaoscquencing reactions may be pcrfnnned in Bijuid phase. In such 
embodiments, the labeled nucleotides incorporated in tlie microsoquencing primers during the microsoquenciiio 'actions are 
detected by hybridizing the extended miaosEquencing primers to sequences compbnentary to the microsequancing jitimcrs. 
The sequences complementary to the microsoquencino primors are in^raobnized on a support, such as those described above. 
The amplification and microsequencing roacOons perfomied in liquid phase may be multiplexed, allowing the samples to be 
tested simultaneously for tens, hundreds, thousands or more biallelic markers. 

Preferably, tlie array used in tho haplotype analysis comprises one or more groups of biallelic markers known to be 
located in proximity to one another in the genome. For example, tho biaUelic markers in the groups may be derived from a 
siTHilB YAC insert, a single BAD insert or a BAC subclone. Alternatively, the biallelic markers in the groups may be derived 
from adjacent ordered clones. Tho biallelic markers in the groups may be located within a genomic region spamiing less than 
Ikb. from 1 to 5kb, from 5 lo lOkh, from 10 to 25kh, fram 25 to 50kb, from 50 to 150kb, from 150 to 250kb. from 250 to 
500kb, from SOOkb to 1Mb. or more than IMh. In soma embodiments, the biallelic markers in the groups comprise liiallclic 
markers which have been bcalizcd lo the same chromosomB, subchromosomal region, or gene. 

It wiO be appreciated that the ordered DNA containing the biallenc markers need not completely cover the genomic 
regions of these lengths but may instead be incomplete contigs having one or more gaps therein. 

In some embodiments, the bialloiic markers known to be located in proximity to one another in (he genome may bo 
located in physical proximity on the array. For example, the array may comprise one or more groups of at least 3 biallelic 
markers known to be located in proximity to one another in the genome. In some embodiments, the array may comprise one 
or more groups of at least B bialldc markers known to be located in proximUy to one another in the genome. In other 
embodiments, the array may comprise one or more groups of at least 20 biallelic markers known to be located in proximity 
to one another in the genome. 

The array may comprise one or more groups of biallelic markers known to be located on the same subchromosomal 
region. For example, the array could comprise two or more biaRelic markers located at 21q11.2 ( selected from tho group 
consisting of SEQ ID Nos. 29, 79, 30 and 80 ), two or more markers located at 21q21 (selected from the group consisting of 
SEQ ID Nos 1, 51. 2, 52, 3 and 53), two or more markets located at 21q21.2 (sslacted from the group consisting of SEQ ID 
Nos 17, 67, 18, 68, 19. 69. 20, 70, 21, and 71) , two or more markers located at 21q21.3-q2113 (selected from the group 
consisting of SEQ ID Nos 25, 75. 28, 78, 27, 77, 28, 78, 31, 81, 32, 82, 38, 88, 39, 89, 40. 90, 48, 88, 49, 99. 50. 100, 
22. 7Z 23, 73, 24, 74, 4, 54, 5, 55, 6. 56, 7, 57, 8. 58, 9, 59. 10. 60. 11, 61. 12. 62. 1 3, 63. 14, 64, 15, 65, 1$, and 66 
). two or more markers located at 21q22.2 {selected from the group consisting of SEQ ID Nos 41, 91. 42, 92, 43, 93. 44. 
94. 45. 95, 46. 98. 47, and 97) , and two or more markers located at 21o213 (selected from the group consistinn of SEQ 
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ID Was 33, 83, 34, 84, 35, 85, 36, 86, 37, and 87). Alternatively, the array could compriso amplification primers capable of 
generating an amplilication product containing the polymorphic bases of two or moro hiallclic markers located at 21 q1 1.2 ( 
for exanfipla amplification primors capablo of generating aii amplification product containing the polymorphic bases of two or 
more biallclic markers selected from the group consisting of SEQ ID Nos. 2D, 70, 30 and 80 ), two or more markers located 
at 21q21 (for example, amplificalian primers capable of generating an nmplilicalion product containing the polymorphic 
bases of two or more bialleltc markers selected from Ihu group consisting of SEQ ID Nos 1, 51, Z 52, 3 and 531 two or 
more markers located at 21q21.2 (for example, amplification primers capobia of generating an amplificalinn prndnct 
containing the polymorphic basos of two or more biallelic markers soloctcd from the group consistijio of SEQ ID Nos 17. 67, 
18, 68, 19, 60, 20, 70, 21, and 7\) , two or more markers located at 21q21.3-q22.13 (for example, amplification primors 
capable of generating an amplification product containing the polymorplu'c bases of two or moro biallclic markers selected 
from the group consisting of SEQ ID Nos 25, 75, 26, 76, 27. 77. 28. 78. 31. 81. 32, 82, 38, 88, 39, 89, 40, 90, 48, 98, 49, 
99, SO, 100. 21 72, 23, 73, 24, 74, 4, 54, 5. 55, 6. 56, 7, 57. B. 58, 9, 59, 10, 60, 11, 61, 12, 62. 13. 63, 14, 64, 15, 
65, 16, and 6G ), two or more markers located at 21q2Z2 ( lor eiampic. amplilicau'on primers capablo of generating an 
amplification product containing the polymorphic bases of two or more biallelic markers solociod from the group consisting 
of SEQ 10 Nos 41, 91. 4Z 92, 43, 93, 44. 94. 45, 95, 4G. 96. 47. and 97) . and two or more markers located at 21q2Z3 
{for eiomple. amplification primers capable of genniating an amplification product containing the polymorphic bases of two 
or moro biaHolic markers selected from the group consisung of SEQ 10 Nos 33. 03, 34, 04. 35, 85, 36, 86. 37, and 87). 

In some embodiments, the array may comprise one or more groups of biallclic markers derived from tho same BAG 
insert. For example, the array could comprise two or more markers selected from the group consisting of SEQ ID Nos. 29. 
79, 30, and 80 (dwived from BAG 1), two or more markers selected from the group consisting of SEQ ID Nos. 1 and 51 
(derived from BAG 2), two or more markers selected from the group consisting of SEQ ID Nos. 2 , 52, 3, and 53 (derived 
from BAG 3), two or more markers selected from the group consisting of SEQ ID Nos. 17, 67, 18. 68, 19, 69. 20, 70, 21, 
and 71 (derived from BAG 4), two or more markers selected from the group consisting of SEQ ID Nos. 25, 75. 26, 76, 27. 
and 77 (derived from BAG 5). two or more markers sleeted from the group consisting of SEQ ID Nos. 28, 78, 31, 81, 32, and 
82 (derived from BAG 6). two or more markers selected from the group consisting of SEQ 10 Nos. 38, 88, 39. 89, 40, and 
9D (derived from BAG 7), two or more markers selected from the group consisting of SEQ ID Nos. 48, SB, 49, 99, 50. and 
100 (derived from BAG 8), two or more markers selected from the group consisting of SEQ ID Nos. 22, 72, 23, 73, 24, and 
74 (derived from BAG 9). two or more markers selected from the group consisting of SEQ ID Nos. 4, 54, 5, 55, 6, 56, 7, 57, 
8, 58. 9. 59, 10, and 60 (derived from BAG 101. two or more markers selected from tho group consisting of SEQ ID Nos. 
11, 81, 12. 62, 13, 63, 14. 64. 15, 65, 16. and 66 (derived from BAG 111, two or more markers selected from the group 
consisting of SEQ ID Nos. 41, 91 , 42. 92. 43, 93, 44, 94, 45. 95. 48, 96. 47, and 97 (dorn/od from BAG 12). or two or more 
markers selected from the group consisting of SEQ ID Nos. 33, 83, 34, 84. 35, 85, 36. 86, 37. and 87 (dorivcd from BAG 
13). 

Arrays comprising biallelic markers known to be located in proximity 1o ona anoihor in the genome permit 
hapiotyptng analyses to be conducted even when the chromosomal locations of the bialloGc markers has not been 
deiermined. For example, using the Drocedures described above, the alleles of sets of biallelic markers which are oresent in 
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nucleic acid samples from trait positive and trait negative individuals may be determined using a succession of arrays, with 
each array having one or more groups of nucleic acids known to be located in proximity lo one another thoruon. T!ie 
succession of arrays may comprise biaJielic markers spanning the entiro gonome having any of ihe average inlermarltcr 
distances specified abovo. Altomativcly, the succession of arrays need not spm Uie entire genome but may instead be 
derived from two or more contigatcd YAC, BAC, or BAC subclone inserts. A statistical analysis is perfomiod on !ha nilules 
of biallolic markers present in the trait positive and trait negative individuals to iifcntify a haplotype having a statislicaDy 
significant association witli the trait Once a statistically significant haplotype is identified, the gDiioniic locations of the 
hiallclic markers comprising the haplotype may be detcrmified using tlic methods doscrilmd lieruin. In addition, using the 
procedures described herein, the gnnomic region harboring the biallelic markers in llic statistically significant linpliilyiii; may 
be evaluated ta identify the genes associated with the trait. 

Although this invention has boon doscribcd in terms of certain preferred embodiments, other embodiments wluch 
will be apparent to those of ordinary skill in the art in view of the disclosure horoin are also within the sccpo of Uiis 
invention. Accordinflly, the scope of tho invontion is intended lo be defined only by reforence to the appended claims. 
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Table 1 



Btallelic marker 
(Genset code) 


BAC 


Insert size 
(kb) 


average Intermarker 
distance (kb) 


subchromosomal 
localization 




99-2378 
99-2381 


1 
1 


150 
150 


75 
75 


21qll.2 
21q11,2 




1 99-2103 


2 


110 


110 


21021 1 




99-2228 
99-2229 


3 
3 


105 
105 


52.5 
52,5 


21q21 
21q21 



99-2312 


4 


130 


26 


21q21.2 


99-2315 


4 


130 


26 


21q21.2 


99-2320. 


4 


130 


26 


21q2l.2 


99-2321 


4 


130 


26 


21q21.2 


99-2324 


4 


130 


26 


21q21.2 



99-2362 


5 


100 


33.3 


21q21-3*q22.l3 


99-2364 


5 


100 


33.3 


21q2l.3-q22.13 


99-2367 


5 


100 


33.3 


21q21.3-q22.13 



99-2371 


6 


135 


45 


2lq22.11-q22.13 


99-2413 


6 


135 


45 


21q22.11-q22.l3 


99-2419 


6 


135 


45 


2lq22.l1-q22.13 



99-2610 


7 


185 


61.7 


21q22.11-q22.l3 


99-2515 


7 


185 


61.7 


21q22.11-q22.l3 


99-2620 


7 


185 


61.7 


21q22.11-q22.13 



99-2645 


8 


250 


83.3 


2lq22.11-q22.l3 


99-2647 


8 


250 


83.3 


21q22.11-q22.13 


99-2649 


8 


250 


83.3 


21q22.1l-q22.13 




99-2333 


9 


140 


46.7 


21q22.11-q22.l3 


99-2341 


9 


140 


45.7 


21q22.11-q22.13 


99-2342 


9 


140 


46.7 


2lQ22.11-q22.l3 



99-2240 


10 


95 


13,6 


2lq22.11-q22.13 


99-2242 


10 


95 


13.6 


21q22,l1-q22.13 


99-2244 


10 


95 


13.5 


21q22.11-q22.l3 


99-2246 


10 


95 


13.5 


21q22.11-q22.l3 


99-2248 


10 


95 


13.6 


21q22.11-q22.13 


99-2250 


10 


95 


13.6 


21q22.11-q22.13 


99-2251 


10 


95 


13.6 


21q22.11-q22.13 



99-2269 


11 


40 


6.7 


21q22.11-q22.13 


99-2271 


11 


40 


6.7 


21q22.11-q22.13 


99-2272 


11 


40 


6.7 


21q22.ll-q22.13 


99-2273 


11 


40 


6.7 


21q22.11-q22.13 


99-2275 


11 


40 


6.7 


21q22.l1-q22.13 


99-2278 


11 


40 


6.7 


21q22.11-q22.13 



99-2624 


12 


165 


23,6 


21q22,2 


99-2625 


12 


165 


23,6 


21q22.2 


99-2630 


12 


155 


23.6 


21q22.2 


99-2633 


12 


165 . 


23.6 


21q22,2 


99-2634 


12 


165 


23.6 


21q22.2 


99-2637 


12 


165 


23.6 


21q22.2 


99-2642 


12 


165 


23.6 


21q22.2 




99-2559 


13 


205 


41 


21q22.3 


99-2566 


13 


205 


41 


21q22.3 


99-2567 


13 


205 


41 


21q22.3 


99-2570 


13 


205 


41 


21q22.3 


99-2571 


13 


205 


41 


21q22.3 
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WHAT IS CLAIMED IS : 

1 . A mclliod of obtaining a set of biallclic markers comprising the steps of: 

oljlninino o nucleic acid library comprising a pluiality of genomic DNA fragments comprising the full gerimiic m 
5 a portion thereof; 

doleiminino Ihu order of said plurality of Hanomic ONA fraynicnts in the gonomo; 
determining the sequence of selected regions of said plurality of genomic DMA fr3i;mcnt5; and 
identifying nucleotides in said plurality of genomic ONA fragments which vary between individunis, theraby 
dofining a set of biallclic markers. 

2. The method of Claim 1, wheroin said ideniilying step comprises identifying about 20.000 biallclic 

markers. 

3. The method of Claim 1, wherein said identifying step comprises identifying about 40,000 biallclic 

markers. 

4. Tlie method of Claim 1. wherein said identifying step comprises identifying about 60,000 biallclic 

15 markers. 

5. Tlie method of Claim 1, wherein said identifying step comprises identifying about 80,000 biallelic 

markers. 

6. The method of Claim 1. wherein said identifying step comprises idcntilying about 100,000 bialletic 

markers. 

^0 7, The method of Claim 1, wherein said identifying step comprises identifying about 120,000 biallclic 

markers. 

8. The mathod of Claim 1, wherein said biallclic markers arc separated from one another by an average 
distance of lOkb-200 kb. 

9. The method of Claim 1. wherein said biailelic markers aro separated from one another by an average 
25 distance of 15kb.150kb, 

10. The method of CWm 1, wherein said biailelic markers are separated from one another by an average 
distance of 20kb*100 kb. 

The method of Claim 1, wherein said biailelic markers are separated frem one another by an average 
distance of 100kb-150kb. 

1 2. The method of Claim 1, wherein said biailelic markers arc separated from one another by an average 
distance of 50- IQOkb. 

13. The method of Claim 1, wherein said biailelic markers are separated from one another by an avorago 
distance of 25 kb-50 kb. 

14. The method of Claim 1, wherein the step of determining the sequence of selected regions of said 
35 plurality of genomic DNA fragments comprises inserting fragments of said plurality of genomic DNA fragments into a 
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vector to oencrate a plurality of subclones and determining the sequence of a rcoion of tlie inserts in said plurality of 
subclones or a subset thereof. 

1 5. The mstfiod of Claim 14. wherein said step of dtilcrmininu the sequence of a region of said inserts or 
a subset llicroof comprises deiorininino the sequence of one or both end regions of said insorts or a subset tlicrcof, 

IB. The mclliod of Claim 14, wherein the step of dutermininfl the sequence uf one or both end regions of 
$aid plurality of subclones comprises determining the sequence of about 500 bases at each end of said suLcIunus or a 
subset thereof. 

17. The mellind of Claim 1, wherein a scl of about 10.000 to about 20,000 genomic DNA inserts with 
an average siic between 100kb and 300kb arc ordered. 

18. The method of Claim 1, wherein a set of about 10,000 to about 30,000 genomic DNA inserts with 
an average size between lOOkb and 150 kb are ordered. 

19. The method of Claim 1, wherein a set of about 15.000 to about 25,000 ocnomic ONA inserts with 
an average size between lOOkb and 200 kb are ordered. 

20. The method of Claim 1, wherein said idonlifying step comprises identifying between 1 and B biallulic 
markers per genomic DNA fragment. 

21. Tlifl method of Claim 1, wherein said identilying step comprises identifying an average of 3 biallclic 
markers per genomic DMA insert, 

22. The method of Claim 1, wherein said genomic DNA fragments are in a Bacterial ArlificinI 
Chromosome. 

23. The method of Claim 1, wherein said genomic DNA fragments are in a Yeast Artificial Chromosome, 

24. The method of Claim 1, further comprising duicrmining ihc position of said bialletic markers along the 
genome or a portion thereof. 

25. The method of Claim 24, wherein the step cf determining the position of said bielleiic markers along 
the genome or portion thereof comprises determining the position of said biallclic markers along a chromosome. 

26. The method of Claim 24, wherein the step of determining the position of said biallelic markers along 
the genome or portion thereof comprises detemtining the position of said biallelic markers along a subchromosomal 
region. 

27. The method of Claim 1, further comprising IdentifYing biallelic markers which arc in linkage 
disequilibrium with one another. 

28. The method of Claim 27. further comprising obtaining pluraHties of biallelic markers such that oach 
marker is in linkage disequilibrium with at least one IdentiFie markers. 

29. The method of Claim 1, wherein said portion of the genome comprises at least 200 kb of contiguous 
genomic DNA. 

30. The method of Claim 1, wherein said portion of the genome comprises at least 300 Ich of contiguous 
genomic DNA. 
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31. The mothod of Claim 1, wherein said portion of ihe gcnonic comprijos a( least 500 kb of contiguous 
genomic DNA. 

32. The method of Claim 1, wherein said portion of the onnome compiistis at least 2 Mb of contiguous 
genomic DNA. 

33. The mcdiod of Claim 1, wherein said portion of the genome comprises at Isast 5 Mh of contiguous 
genomic DMA. 

34. The method of Claim 1, wherein said portion of the genome cmiiprises at least 10 f^b of contiynoiis 
genomic ONA. 

35. The mcllmd of Claim I. wherein said portion of the ocnomc comprises at least 20 Mb «f coiiliguous 
genomic DNA. 

36. The method of Claim 1, further comprising the step of identifying one or more groups of bialiclic 
markers which are in proximity to one another in the ocnomc. 

37. The method of Cioim 36, wherein the biallelic markers in each of these groupj arc located within a 
genomic region spanning less than Ikb. 

30 The method of Claim 30, wherein the biallelic markers in each of these groups arc located within a 
genomic region spanning from 1 to 5kb. 

39 The method of Claim 36, wherein the biallelic markers in each of these groups arc located within a 
genomic region spanning from 5 to lOkb. 

40 The method of Claim 3G, wherein the biallelic markers in each of these groups arc located within a 
genomic fcgion spanning from 10 to 25kb. 

41 The method of Claim 36, wherein ific biallelic markers in each of these groups arc located within a 
genomic region spanning from 25 to 50kb., 

42 The method of Claim 36, wherein Ihc biallelic markers in each of these groups are located within a 
genomic region spanning from 50 lo 150kb. 

43 The method of Claim 36. wherein the bialiclic markers in each of these groups are located within a 
genomic region spanning from 150 to 250kb, 

44 The method of Claim 36, wherein the biallelic markers in each of these groups arc located within a 
genomic region spanning from 250 to 500kb. 

45 The method of Claim 36, wherein the biallelic markers in each of these groups arc located within a 
genomic region spanning from 500kb lo mb, 

45 The method of Claim 36, wherein the bialiclic markers in each of these groups are located within o 
genomic region spanning more than 1Mb, 

47, A method of obtaining a set of biallelic martcors comprising the steps of: 

obtaining a nucleic aiud Ubrary comprising genomic DNA fragments comprising the full genome or a portion 

thereof; 

determining the sequence of selected regions of said genomic DNA fragments: 
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idBntifying nucleotides in said ocnomic DMA fragments which vary botwden individuols. tlicrcby dufining a S6t 
of bialiclic markers; and 

doterminrng the order of said bialiclic markers along the gnnumc or poition thereof. 
4B A SBt of binllclic markers obtained by tho method of Claim 1. 

49 The set of biaUelic markers of Claim AW, whorsin the markers tri said set have a known tjiMtumic 

position. 

50 Tho set of bialiclic markers of Cbim 40, whoroin tfic arc oniered relative to one another. 

51. A set nf biallelic markers having o known rclaliunship to one another and a known genomic position, 
said set of biallck markers being obtained by the method of Claim 1. 

52. Tlic set of bialiclic markers of Claim 48, wherein said bialiclic markers have hotorozynosity rates of 
at least about 0.18. 

53. The set of biatlctic markers of Claim 48, wherein said bialiclic markers have hotarozygosity rate of at 
least about 0.32. 

54. The set of bialiclic markers of Claim 48. wherein said hiallolic markers have a hetero/ygosity rale of 
at least about 0.41 

55. A map comprising an ordored array of at least 20.000 bialiclic markers obtained by the method ol 

Claim 1. 

5G. The map of Claim 55, comprising an ordered array of at least OO.QflO bialiclic markers obtained by 
the method of Claim 1. 

57. Tlie map of Claim 55 comprising an ordered arroy of at least 100.000 bialiclic markers obtained by 
the method of Claim 1, 

58. The map of Claim 55, wherein said biallelic markers are distributed at an average marker density of 
otto marker every 15Qkb. 

59. The map of Claim 5B, wherein said biallelic markers arc distributed at an average marker density of 
one marker every 50 kb. 

60. The map of Claim 57, wherein said biallelic markers are distributed at an average marker density of 
one marker every 25 kb. 

61. A method of identifying one or more biallottc markers associated with a detectable trait comprising 
the steps of; 

determining tho Ircqucncies of each allele of one or more bialolic markers obtained by the method of Claim 1 
in individuals who oxpress said detectable trait and individuals who do not express said detectable trait; and 

identifying one or more alleles of said one or more biallelic markers which aro statistically assoclatod with the 
expression of said detectable trait. 

62. The method of Claim 61, wherein said detectable trait is selected from the group consisting of 
disease, drug response, drug efficacy, and drug toxicity. 
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63. The method of Claim 61, wherein the phcnotypo of said individuals who express said dutccinhle trait 
and the phcnotypo of said individuals who do not express said detectable trait aro icndily distinguishable from one 
anothor. 

64. The mulhod of Claim 61, wherein the individuals who express said dotactable trait and the indivithiais 
who do not expross said detectable trait are solacted Irom a blinndal phenotypu distribution. 

65. The method of Ctnim 61. wherein said individuals who express said detoctoble trait arc at one 
phcnotyptc extreme of the population and said individuals who do not express said dotectable trait ore at the uiher 
phcnotypic extreme of the population. 

66. A methail of identifying a haplotype assadatcd with a trait comprising the steps of: 
obtaining nucleic acid samples (rom^rait positive and trait negative fndividuats; 

determinino (he frcquoncios of tlie alleles of each member of a group of biallclic markers obtnincd by the 
mctluid of Claim 1 known to be located proximity to one another in the genome in said nucleic acid samples; and 

identifying a plurality of alleles of biallelic markers having a statistically sionificant association with said troit. 

67. The method of Claim 6G, wherein said dotectable trait is selected from the group consisting of 
disease, drug response, drug efficacy, and drug toxicity. 

68. The method of Claim 66, wherein the biallelic markers in each of those groups ere located within a genomic 
region spanning toss than Ub. 

69 The method of Claim 66, wherein the biaOclic markers in each of these groups arc localcd within a 
gonomic region spanning from 1 to 5kb. 

70 The method of Claim 66, wherein the biallolic markers in each of these groups are located within a 
genomic region spanning from 5 to lOkb. 

71 The method of Claim 66. wherein the biallclic markers in each of these groups are located rvithin a 
genomic region spanning from 10 to 25kb. 

72 The method of Claim 66, wherein the biallelic markers in each of these groups arc located within a 
genomic region spanning from 25 to 5Qkb. 

73 The method of Claim 65, wherein the biallelic markers in each of these groups are located within a 
genomic region spanning from 50 to ISOkb. 

74 The method of Claim 66, wherein the biallelic markers in each of these groups are located v;iihin a 
genomic region spanning from 150 to 250kb. 

75 The method of Claim 66, wherein the biallelic markers in each of these groups arc located v/ithin a 
genomic rogion spanning from 250 to 500kb. 

76 Tho method of Claim 60, wherein the biallclic markers in each of these groups are located within a 
genomic region spanning from 500kb to 1Mb. 

77 The method of Claim 66, wherein the biallelic markers in each of these groups are located within a 
genomic region spanning more than 1Mb. 
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78. A mothod of identifying one or more biallelic markers associated witti a dctociable trait comprising 
tho stops of : 

sduciing a gene in whidi mutation: result in a detectable trait or a gono ^u^pocted of being associated with a 
detectable trail; aad 

idontifying one or more biatlelrc markers olilaiiiud by tlia msttiod of Claim 1 willtiri tiie yenomic region 
harboring said gene whicf) are associated with said detectable trait. 

79. The mothod of Claim 78, whoroin said dotcctalilc iraii is selected from the group consistinn of 
disease, drug response, drug officacy, and drug tuxicity. 

00. The method of Claim 78. wherein said identifying stop comprises: 

dctormining the frequencies of said one or more btallclic markers in individuals who^exprcss said detectable 
trait and individuals who do not express said dctcctablo trait; and 

identifying one or more biallollc markers whiclt arc statistically associated with tftc Gspiession of said 
detectable (rait. 

81. An array of nucleic acids fixed to a support, said nucleic acids comprising at least 8 consecutive 
nucleotides, including the polymorphic nucleotide, of one or moro biallelic markers obtained by the method of Claim 1. 

82. The array of Claim 81, wherein said nucleic acids comprise et least 8 consecutive nucleotides, 
including the polymorphic nucleotide, of at least five biallolic markers obtained by the method of Claim 1. 

03. The array of Claim 61. wherein said nucleic acids comprise at least 8 conscculivo nucleotides, 
including the polymorphic nucleotide, of at least ten biallelic markers obtained by the method of Claim 1. 

84. An array of nucleic ecids fixed to a support, said nucleic acids comprising at least 8 consecutive 
nucleotides, including the polymorphic nucleotide, of one or more groups of biallelic markers known to he located in 
proximity to one anothor in the genome. 

85. An array of nucleic acids fixed to a support, said nucleic acids comprising amplification primers for 
generating an amplification product comprising at toast 8 consecutive nucleotides, including the polymorphic nucleotide, 
of one or mare biallelic markers obtained by the method of Claim 1. 

66. An array of nucleic acids Taed to a support, said nucleic acids of comprising amplification primers for 
generating an amplification product comprising at least B consecutive nucleotides, including the polymorphic nucleotide, 
of one or more groups of biaHetic markers known to be located in proximity to one another in the genome. 

87. An array of nucleic acids fixed to a support, said nucleic acids comprising one or more 
microscqucncing primers for determining tho identity of the polymorphic base of one or more nucleic acids comprising at 
least 8 consecutive nucleotides, including tho polymorphic nucleotide, of one or more biallelic markers obtained by the 
method of Claim 1. 

88, An array of nucleic acids fixed to a support* said nucleic nucleic acids comprising one or more 
microsequencing primers for detennining the identity of the polymorphic bases of one or more groups of biallelic markers 
known to be located in proximity to one another in the genome. 
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69. An array of nucleic acids fixed to a support, whoroin said nucleic ecids are complementary to one or 
marc microscqucncing primers for detcrmtnino the identities of tlio polymorphic bases of one or more binlluHc markers 
obtained by the method of Claim 1. 

90. The array of Claim 09, wherein said nucleic acids arc complementary tu at least live microscriuencing 
printers for dotermining the identities of the polymorphic bases of at least five hrnlfelic inarkers ulitaiiied by (he method 
of Claim 1, 

91 . The array of Claim 89« wherein said nucleic acids arc complomontary to at least ten microsequcncinu 
primers for detcrmininQ the identities of the polymer jdtic bases of at least tco biallotic markers obtoinod by the mctliorl 
al Claim 1. 

92. An array of nucleic acids fixed tu a support, said nucleic acids comprising one or more nucleic acids 
complementary to one or more microsequencing primers for determining the identity of the polymorphic bases of one or 
more groups of biatlelic markers known to bo located in proximity to one onother in Uie genome. 

93. The array of any one of Claims 6^, 86. 88, and 92, wherein the members of each of said ono or more 
groups of biallclic markers arc located in physical proximity to one another on snftl support . 

94. The array of any one of Claims 84, 86, 88, and 92, wherein said biallelic markers in each of those 
groups are locolcd within a genomic rogion spanning less than Ikb. 

05 The array of any one of Claims 64, 86, 86, and 92, wherein said biallelic markers in each of these 
groups are located within □ genomic region spanning from 1 to 5kb. 

96 The array of any one of Claims 84, 85, 88, and 92. whcroin tho biallelic markers in each of these 
groups are locatod within a genomic region spanning from 5 to lOkb. 

97 The array of any one of Claims 04, 86, 88, and 92, wherein the biallelic markers in each of these 
groups are located within a genomic region spanning from 10 to 25kb. 

90 The atray of any one of Claims 64, 86. 88, and 92, wherein tho biallelic markers in each of these 
groups are located within a genomic region spanning from 25 to 50kb. 

99 The array of any one of Claims 84, 88, 88. and 92, wherein the biallelic markers in each of these 
groups are located within a genamtc region spanning from 50 to ISOkb. 

100 Tho array of any one of Claims 84, 66. 88. and 92, wherein the biallcFic markers in each of these 
groups are located within a genomic region spanning from 150 to 250kb. 

101 Tlic array of any one of Claims 04, 86, 88, and 92, wherein tiic biallelic markers in oach of these 
groups are located within a genomic region spanning from 2B0 to SOOkb. 

102 The array of any ono of Claims 84, 86, BE, and 92, whorein the biallclic markers in each of these 
groups are located within a genomic region spanning from SOOkb to 1Mb. 

103 The array of any one of Claims 84, 88, 68, and 92« wherein the biatlelic markers in oach of these 
groups are located within a genomic region spanning more than 1Mb. 

104. The array of any one of Claims 84, 66, 68, and 92, wherein each group of biallelic markers • 
comprises at least 3 biallelic markers. 
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105. The array of any one of Claims 84, 66, 86, and 92, wherein cacli group of binllcfic markers 
cornpriscs at Icnst 6 bialtetic markers. 

106. Tf»c array of any one of Claims 84, BG, 00, and 92, whorein each group of biallclic mnrki!rs 
comprises at least 20 biallulic inarkcrs. 

107. A msihod for dcterminiim whether an individual is at risk of duvulniiino a duteclablii liail or suffers 
from a detectable trait associated with said trait ciiniprisiiiu the stops of: 

cblainiiiD a nudcic acid sample from said iiulividiial; 

screening said midcic acid sampio with one or more hiallelic markers otitainod by the method of Claim 1; and 
determining whether said nttcleic add sample contains one or more uf biallclic markers statistically 
associated with said detectable trait. 

108. The method of Claim 107, wherein said detoctnblu trait is selected from the group consisting of 
disease, drug response, tlrug elficacy and drug toxicity, 

109. The method ol Claim 107, wherein said biallalic markers were obtained by the method of Claim 61. 

11 0. The method of Claim 1 07, wherein said biallcHc markers ware obtained by the method of Claim 70. 

111. A method of using a drug comprising: 
obtaining a nucleic acid sample (rem an individual; 

determining the identity of the polymorphic base of one or more bialielic markers obtained by tttu method of 
Claim 1 winch is associated with a positive response to treatment with said drug or one or more bialielic markers 
obtained by the method of Claim 1 which is assodatcd with a negative response to treatment with said drug; and 

administering said drug to said individual if said nudcic acid sample contains one or more bialielic markers 
assodatcd with a positive response to treatment with said drug or i[ said nudcic add sample lacks one or more bialldtc 
markers assodatcd with a negative response to said drug. 

112. The method of Claim 111, wherein said determining step comprises determining the identity of the 
polymorphic base of one or more biallcGc markers obtained by the method of Claim 62 which is associated with a 
poshive response to treatment with said drug or ono or more biallenc markers obtained by the method of Claim 62 which 
is associated with a negative response to treatment with said drug. 

113. The method of Claim 111. whcrdn said determining step comprises determining the identity of the 
polymorphic base oi one or more bialielic markers obtained by the method of Claim 79 which is associated with a 
positive response to treatment with said drug or one or more bialielic markers obtained by the method of Claim 79 which 
is assodatcd with a negative response to treatment with said drug. 

114. A method of selecting an individual for ii^dusion in a clinical trial of a drug comprising: 
obtaining a nucleic add sample from an individual; 

dotormining the identity of the polymorptiic base of one or more bialielic markers obtained by the method of 
Claim 1 which is assodatcd with a positive response to treatment with said drug or one or more bialielic markers 
associated with a negative response to treatment with said drug in said nudelc add sample; and 
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including said individual in said clinical trial if said nucioic acid sample contains anc or more biallclic markers 
obtained by the method of Claim 1 wliich is associated with a positive response to treatment with said druQ or if said 
nucleic acid sample locks one or more bialk^Iic markers associated with n negative response to said drug. 

115. Tha method of Claim 114r wherein said determining step comprises determining tho identity of the 
polymorphic base of one or mora biallelic markers obtained by the method of Claim 62 which is assncialed with a 
positive response to l/calmcnt with sniil drug or one or more biallclic markers obtaijicd by the metliud of Ciaini G2 which 
is associated with a negative response to treatment with said drug. 

11G. The method of Claim 114, wherein said determining step coinpiises detuimining the identity of the 
polymorphic base of one or more biallclic markers obtained by the method of Claim 79 which is associated with a 
positive response to treatment with said drug or one or more biallolic markors obtained by the mclhud of Claim 7i) which 
is associated with a negative response to trcatmont with said druy. 

117. A method of identifying a gene associated with a detectable trait comprising the steps of: 
determining the frequency of each allele of one or more biallclic markers obtained by the method of Claim 1 in 

individuals having said detectable trait and individuals tacking said detectable trait; 

identifying one or more alleles of one or more biallclic markors having a statistically significant association 
witli said detectable trait; and 

identifying a gene in linkage disequilibrium with said ono or mora alleles. 

118. The method ol Claim 78, further comprising identifying a mutation in the gene which is associated 
with said detectable trait. 

110. Tlic method of Claim 78. wherein said datectabic trait is selected from the group consisting of 
disease, drug response, drug efficacy, and drug toxicity. 

120. A method of identifying a gene associated with a detectable trait comprising: 
. selecting a geno suspected of being associateil with a detectable trait; and 

identifying one or more biallelic maikers obtained by the method of Claim 1 within the genomic region 
harboring said gene which are associated with said detectable trait. 

121. The method of Claim 120, wherein said detectable trait is selected from the group consisting of 
disease, drug response, drug eiHcacy, and drug toxicity. 

1 22. The method of Claim 120, whoroin said identifying step comprises: 

determining the frequencies of said one or mote biallelic markers in individuals who express said dotcctabls 
trait and individuals who do not express said detectable trait; and 

identifying one or more biallolic markers which ere statisticolly associated with the expression of said 
detectable trait. 

123. A method of identifying a haplotype associated whh a trait comprising the stops of: 
obtaining nucleic acid samples from trait positive and trait negative individuals: 

conducting an amplificatian reaction on said nucleic acid samples using amplification primers capable of 
generating amplification products containing the polymorphic bases of a plurality of biallclic markers; 
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contacting one or more arrays according to Claim 84 with said amplification products; 
determining the identities of the polymorphic basc3 of said amplification products; ond 
tdentilyino ahaplotvpc having a statisticaily stgnificaat association with said trait. 

1 24. A method of identirying a haptotypo associated with a trait coiniusing thi! stups of: 
obtaining nucleic acid samples from trait positivo and trait m:oalivi! individuals; 

conducting amplification reactions on said nucleic acid samples using amplification primors copabls of 
gonorating amplification products containing the polymorphic basos of a plurality of biallclic mnrlcm; 

contacting one or more ariays according to Claim 68 with said amplification products; 

conducting microsequoncing reactions on said amplification products using microscquencing primers nn said 
arrays, thereby generating elongated microssqucnctng primers comprising the polymorphic bases of soid amplilicaiion 
products; 

determining the identities of said polymorphic bases; and 

identifying a haplotype having a statistically significant association with sold trait. 

1 25. A method of identifying a haplotype assoa^tcd with a trait comprising ths steps of: 
obtaining nudcic acid samples from trait positive and trait negative individuals; 

conducting amplification reactions on said nucleic add samples uisng amplification primors which ore capable 
of Qoncrating amplification products containing the polymorphic bases of a plurality of biallolic markers: 

conducting microscquencing reactions an said nucleic acid samples, thereby generating microsequencing 
products containing the polymorphic bases of one or more biallclic markers et their 3' ends, said polymorpliic bases being 
detectably labeled; 

contacting one or more arrays according to Claim 92 with said miaosequencing products such that said 
microscquencing products spcciHcally hybridize to said nucleic acids complemeniory to said microscquendno primers; 

determining the identities of the polymorphic bases of said miaoscqucncing products; and 

Identifying a haplotypo having a statistically significant assodation with said trait. 

126. A method of identtfying a haplotype assodated with a trait comprising the steps ol: 

obtaining nudeic add samples from trait positive and trait negative individuals; 

contacting one or more arrays according to Claim 86 with said nucleic add sample; 
conducting an amplification reaction on said nudeic add samples using amplification primers on said array 
which are capable of genoraiing amplification products containing the polymorphic bases of o plurality of biallclic 
markers; 

determining the tdonttties of the polymorphic basos of said amplification products; and 
identifying a haplotype having a statistically significant association with said trait. 
127. A method of determining whathor an individual is at risk of developing Alzheimer's disease or whether 
the individual suffers from Alzheimer's disease as a result of possessing the Apo £ €4 Site A allele comprising: 
obtaining a nucleic add sample from said individual: and 
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determining ths identity of tiic polymorphic bass in one or more of tho sequences selected from the group 
consisting of SEQ ID Nos. 301*305 and SEQ ID Nos. 307-311 or tite sequences complumuntafY tltcroto in said nucleic 
acid sample 

128. The mettind of Claim 127, further coinprlsing delermtMing whothor said nucleic acid sample ccintnia:: 
the sequence of SEQ 10 No. 306 or the sequence cumplcmontary thereto. 

129. The muthud of Clotm 127, wlicrcin said step of docormining the identity of the pnlymurpliic b;iscs in 
one or more of the sequences selected from the group consisting of SEQ ID Nos. 301-305 and SEQ 10 Nus. .'iU7 in 1 or 
the sequences con^loment^ry thereto cmnprises determining wtiethcr said nucleic 3cid sampla contains the seqimnco uf 
SEQ to NO: 3 1 1 or the sequence complenientnry Ihcreto. 

130. The mcthoii of Claim 129. further compiisiny determining whether said nucleic octd sample cnntnins 
the sequence of SEQ ID No. 306 or the sequence complementary thereto. 

131. An isolated nucleic acid comprising a sequence selected from ihs group consisting of SEQ ID No. 
301, SEQ ID fJo. 307, the sequences complementary thereto, and fragments comprising at least 8 cnnsccutivo 
nucleotides, including the polymorphic nucleotide, thereof. 

132. An isolated nucleic acid comprising a sequence selected from the group consisting of SEQ ID No. 302 
, SEQ ID No. 306, the sequences complementary thereto, and fragments comprising at least 8 consecutive nucleotides 
thereof. 

133. An isolated nucleic acid comprising a sequence selected from the group consisting of SEQ 10 No. 
303, SEQ ID No. 309. Uic scqucncos complementary thereto, and fragments comprising at least 6 consecutive 
nucleotides, including the polymorphic nucleotide, thereof, 

134. An isolated nudeic acid comprising a sequence selected from the group consisting of SEQ ID No. 
3Q4, SEQ ID No. 310 , the sequences complementary thoreto. and fragments comprising at least B consecutive 
nucleotides, including the polymorphic nucleolido, thereof. 

135. An isolated nucleic acid comprising a sequence selected from the group consisting of SEQ 10 No. 
3 305, SEQ ID No. 311, the sequences complementarY thereto, and fragments comprising at least 8 consecutive 

nucleotides, including the polymorphic nucleotide, thereof. 

136. An isolated nucleic acid comprising a sequence selected from the group consisting of SEQ ID Nos. 
313-317, SEQ ID Nos. 319-323, and fragments comprising at least 8 consecutive nucleotides thereof. 

137. An isolated nucleic acid comprising a sequence selected Irom the group consisting of SEQ ID Nos. 
10 325-329, SEQ ID Nos. 331-335, the scquonce complementary thereto, and fragments comprising at least 8 conscculivo 

nucleotides thereof. 

138. An set of nucleic acids comprising at least 8 consecutive nucleotides, including the polymorphic 
nucleotide, of one or more biallelic markers obtained by tho method of Claim 1. 

139. A $Bt of nucleic acids comprising amplification primers for generating an amplification product 
35 comprising at least 8 consecutive nucleotides, including the polymorphic nucleotide, of one or more biallelic markers 

obtained by the method of Claim 1. 
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140. A set of nucleic acids comprisino one or moro microsequencino primers lor dclnrmining the identity of 
the polymorphic base of one or mors nucleic acids compristng at least 6 consecutive nucleotides, including the 
polymorphic nucleotide, of one or more biaHoIic markers obtained by tlio method of Claim 1. 
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Figure 1 
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PROSTATE CANCER HAPLOTYPE SIMULATIONS (100 ITERATIONS) 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: GENSET SA 

(B) STREET: 24, RUE ROYALE 

(C) CITY: PARIS 

(E) COUNTRY: FRANCE 

(F) POSTAL CODE (ZIP): 75008 



(ii) TITLE OF INVENTION: Biallelic markers for use in constructing a 
high density disequilibrium map of the human genome. 

(iii) NUMBER OF SEQUENCES: 336 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy Disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: Win95 

(D) SOFTWARE: Word 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2103-270 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2103-270 

(B) LOCATION: 1. .23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2103-270 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
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CTTGGATTCA TATGAGACAG CTAGCAGACC TTCAATTTTT CTACACT 4 7 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: M base pairs 

CD) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE; 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2228-301 

(B) LOCATION: l.,47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2228-301 

(B) LOCATION: 1,.23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2228-301 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
CCCTGCTTAT CCCTGTAAGG TGGAGACCCA TATGGGCAAG GCCAGAC 4 7 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2229-240 

(B) LOCATION: 1..47 
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(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2229-240 

(B) LOCATION: l.,23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2229-240 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TCGTCATCGT GGCCTGGGCT ACAGACTACC TGTTCCAGTC CTTCCAG 4 7 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2240-281 

(B) LOCATION: 1. . 47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2240-281 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY; Potential microsequencing oligo 99-2240-281 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



.GCAATCTTAA TAACTTTTTA TTTCAGTAAT TCGAATCTTT TTTTTCT 



47 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2242-206 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2242-206 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2242-206 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



GTGTTTTCTT TTAGTCAAAT TATCTTATAT TTTACTTTTT TCTTAAG 4 7 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2244-83 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 
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(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2244-83 

(B) LOCATION: 1 . . 23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2244-83 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TAATTGTAGA TACTAAGACC ATTATGCTTA AACCATGTAG GTACTGA 4 7- 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2246-340 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2246-340 

(B) LOCATION: 1,.23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2246-340 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ATTTATATGT TAAATGCAGA GAAAAAGAAA AATAAGTTTT GCAGTAA 47 



(2) INFORMATION FOR SEQ ID NO: 8: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-22^0-76 

(B) LOCATION: 1../17 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2248-76 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2248-76 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GACAGAGAGG GAAGGTAATC TTCCCCTGAA GTCTGCCCAT CCCCTGG 4 7 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2250-236 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2250-236 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2250-236 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
ATGTATCCAA AACAGAATTA ACACACTTTG GGTTTTTTAT TTTTATT 4 7 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2251-151 

(B) LOCATION: 1,.47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2251-151 

(B) LOCATION: 1..23 

(ix)' FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2251-151 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TGAAAAGAAG TTCAGACGAT TGCAGATAGA CTAGTTTGGC TGTTGTG 4 7 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-22G9-179 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(BJ LOCATION: 2A 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2269-179 

(B) LOCATION: 1 . .23 

(i:<) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2269-179 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



AAAATAAAGA AATTCCTAGA GACATACAGC CTATCAAGAT CAAACCA 4 7 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2271-403 

(B) LOCATION: 1. .47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2271-403 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oiigo 99-2271-403 

(B) LOCATION: complement 25.. 47 

(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGGCATTTAT TTCATATTTA TTAACCTTGA TTTTCTTATC TTCAAGT 4 7 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2272-409 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2272-409 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2272-409 

(B) LOCATION: complement 25., 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
AAAAGCACTG CAATTATTTT GGAGACTGTG AAATATTGCA AGTTTTA 4 7 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2273-528 
(D) LOCATION: 1..4 7 



(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(D) LOCATION: 24 

(D) OTHER INFORMATION: base c 



(ix) . FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2273-52Q 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2273-528 

(B) LOCATION: complement 25.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
ACTTGAAGAT AAGAAAATCA AGGCTAATAA ATATGAAATA AATGCCT 4 7 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2275-466 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2275-466 

(B) LOCATION: 1..23 



(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2275-466 
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(B) LOCATION: complement 25.. 47 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

TTGATGATAG CATTAAATAC TCCCAAAAAC TGTGAATAGG GATACTA 4 7 



(2) INFORMATION FOR SEQ ID NO: 16: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 
(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2278-276 

(B) LOCATION: 1. .47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2278-276 

(B) LOCATION: 1. .23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2278-276 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GAAAAAAATG GGAACATCTT CACAGCCTGT GCATCTCCAA CAAGATT 4 7 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2312-358 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base C 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2312-3D8 

(B) LOCATION: 1 . . 23 

r- 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2312-358 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
TTGAAGAGAG AGATGGAAAA AAACGTAGGC CTTCTGGGTA AATGGCC 4 7 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2315-213 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2315-213 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2315-213 

(B) LOCATION: complement 25.. 47 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
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AGATGGATTC TACCCACAGG CAAAAGAAAA CCTTATTTTA AAAATAA 



{?.) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2320-292 

(B) LOCATION: 1..4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2320-292 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2320-292 

(B) LOCATION: complement 25.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ACTCTCATTC ACTAAACTTC AACCGTTTTT ATAAATTTAA TGAATTT 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 
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(A) NAME/KEY: polymorphic fragment 99-2321-82 

(B) LOCATION: 1. .47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

CD) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2321-82 

(B) LOCATION: 1,.23 

(ix) FEATURE: 

(A) NAME/KEY; Potential microsequencing oligo 99-2321-82 

(B) LOCATION: complement 25;. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
TAAAGCTTAC TGAGTGTCCA CTCCGGATAC CTACTCAAAT ATTTCCT 4 7 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2324-338 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2324-338 

(B) LOCATION: l.,23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2324-338 

(B) LOCATION: complement 25.. 47 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21; 



AGATAGAAGA CAAAATCGCA GGAAAAGAAA TCCCTCAACA GTAAAAA 



47 
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(2) INFORMATION FOR SEQ ID NO: 22: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: Al base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE; DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2333-423 

(B) LOCATION: 1..47 



(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2333-423 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2333-423 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 



GAGACGCTAT CTATGCAAGG AGGGTGTTCA ACATTTGGAC AGCCACG 4 7 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2341-485 

(B) LOCATION: 1..47 
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(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequcncing oligo 99-2341-485 
(D) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-234 1-485 
(D) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
ACACATCTGT CTGTTACCTA CACCTTACAA AGAATCGCAC AGGCTCT 4 7 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE; NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2342-217 

(B) LOCATION: 1..4 7 



(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 



(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2342-217 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2342-217 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



TAGAGCCTTG GACTTTCATG ACACTTCTAG AAACAGCCCA GATTGTG 



47 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic- fragment 99-2362-270 r- 

(B) LOCATION: 1,.4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2362-270 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2362-270 

(B) LOCATION: complement 25.. 47 

(Ki) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



TCTCTCTTGG GTGGTTCCTC AACATGTGTG ACCTTGACCA AGTATTG 4 7 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2364-329 

(B) LOCATION: 1,.4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 
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(D) OTHER INFORMATION: base g 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-236/1-329 

(B) LOCATION: 1,.23 

(ix) FEATURE; 

(A) NAME/KEY: Potential microsequencing oligo 99-2301-329 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
ATATAAAATG ATGAACCATA TACGTGAGGC AAGGTAACAT ATAATTG 4 7 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE; 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2367-61 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2367-61 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2367-61 

(B) LOCATION: complement 25., 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
TAAACATTTC ATTATTTCAG AAAATAATAT GCATTTTCAC CAACACA 4 7 



(2) INFORMATION FOR SEQ ID NO: 28: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 4 7 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2371-93 
(D) LOCATION: 1..4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION; 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2371-93 

(B) LOCATION: 1. .23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2371-93 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CTCTAAACTT TCCTAATACT TACATCACTG CCTACTTTTT ACATAAT 4 7 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2378-200 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 



(ix) FEATURE: 
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(A) NAME/KEY: Potential microsequencing oligo 99-2378-200 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2378-200 
(D) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GAGAACTTCC TGTTGAACCT GTTATAGAAC TGTCCTGTCG TCCAAGA 4 7 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2381-394 

(B) LOCATION: 1..4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

CD) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2381-394 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2381-394 

(B) LOCATION: complement 25., 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
AGTGGTCTTC AGGTTATTGG TAGAGAAAAG TAGGGGAGCT AAAGGTG 4 7 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
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(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2413-368 

(B) LOCATION: 1..4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2413-368 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2413-368 

(B) LOCATION: complement 2 5.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



ATTTTAAGAG GAAAACTTAA TGGAAGAATT GTACATAATA TTTCATT 



(2) INFORMATION FOR SEQ ID NO: 32: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2419-285 

(B) LOCATION: 1..47 



(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 



(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2419-285 

(B) LOCATION: 1. .23 



wo 99/04038 PCT/IB98/01 193 

22 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2419-285 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
AAGGGATCAA GCAGTGCCCA CTCCCCACCC TCCAGGGAGC TGTGACT 4 7 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(3) TYPE: NUCLEIC ACID 

(C) STRANDEDNES5: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2559-253 

(B) LOCATION: 1..47 

(i:<) FEATURE: 

(A) NAME/KEY: polymorphic base 

(3) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2559-253 
(3) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2559-253 
(3) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CAGGTGTTTT CATGCCCTCT TAGGGTGTGT CACATCATCC ATCTCAA 4 7 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2566-112 

(B) LOCATION: 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing .oligo 99-2566-112 

(B) LOCATION: 1. .23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2566-112 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



GCCTTCACAA CCGCAGAGGC AAGAGAAGGA GCTTGGCCAC CCTGACT 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2567-329 

(B) LOCATION: 1..4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2567-329 

(B) LOCATION: l.,23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2567-329 

(B) LOCATION: complement 25.. 47 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CACTGTCAGA TATGAAATGA TGCGTGGCTT TCTTTGGGCT ATATTTG , ^7 



(2) INTORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 
(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) . TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:<) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2570-218 

(B) LOCATION: 1..4 7 

(i:<) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

CD) OTHER INFORMATION: base c 

Ci:<) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2570-218 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2570-218 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GGAAAGTTCC AAATTATGAG AAGCGAGGCC TCTGAAGTGG CTAAGTT 4 7 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2571-242 

(B) LOCATION: 1..4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(D) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2571-242 
(D) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2571-242 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
ATAATGAATG AGTATTTGAT ATTATATAAT TAAATGTGTC AGCATTT Al 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2610-121 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2610-121 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2610-121 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
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ATACCCCTTC CCTAGGTATG GCTATATGCT GCACTTAGAA AATTCTC 



(2} INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

{B} TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM; Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY; polymorphic fragment 99-2615-83 
; (B) LOCATION: 1. .47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2615-83 

(B) LOCATION: 1,.23 

(ix) FEATURE; 

(A) NAME/KEY: Potential microsequencing oligo 99-2615-83 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
AACAAATCAC AAGTTGGCAA AAGCAGCAAA TTCTCATCTT CTGGGAA 4 7 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2620-227 
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(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2G20-227 

(B) LOCATION: 1. .23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2620-227 

(B) LOCATION: complement 25.. 47 

■ * 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
TTGACTGGGC TCCTGATGTG TCCAGGGTAT CTTGCTGGCT GTTTTGC 4 7 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2624-407 

(B) LOCATION: 1..4 4 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2624-407 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2624-407 

(B) LOCATION: complement 25.. 44 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 



ATCTGGCCAT AGGCAGAACA TTGGGGGAGA GATGGGGAAA GAGA 



44 



wo 99/04038 



28 



PCT/IB98/01193 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNES5: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2625-70 

(B) LOCATION: 1..47 

(i>:) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2625-70 

(B) LOCATION: l.,23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2625-70 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 



AGTGACTCAA CCAGAAAGAG AGCAGGAGAG AGGACGAAGA GAGGAGA 4 7 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2630-67 

(B) LOCATION: 1..47 

(ix) FEATURE: 
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(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2630-67 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2630-G7 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEO ID NO: 43: 
TAAATTCTGC CTAGAAGATT AAGATTGGTC CAGAACAGGG AGTGTTT 4 7 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2633-129 

(B) LOCATION: 1. .47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2633-129 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2633-129 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
TAGCTATTTC TTCCCCTAGG CAAAGTAGAC AATGAGAGAA CCCTTGA 4 7 



(2) INFORMATION FOR SEQ ID NO: 45: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: Al base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-203^-341 

(B) LOCATION: l.,4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 2 A 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2634-341 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2634-341 

(B) LOCATION: complement 2 5.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



GGAATCAATA TTTATTTATT ATCAACAGGT GAGACATTAT TTATTTA Al 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2637-28 

(B) LOCATION: 1. .47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2637-28 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2637-28 

(B) LOCATION: complement 25.. ^7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: ^6: 
CCATCACTTC CTCCTAGTGA AAAATCAAAG GAGGGTGGGT TTTATAG 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHAF^ACTERISTICS : 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE; 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2642-255 

(B) LOCATION: 1..4 7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2642-255 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2642-255 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
TGAGGGTGTT TCCAGAAGAG ACTAGCATTT GAATCTGAAG TGAGTAA 4 7 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 
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(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2645-118 

(B) LOCATION: 1,,47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2645-118 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2645-118 

(B) LOCATION: complement 2 5.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 



CACAAATTAA TTGCATTGTT ATAGGCTAGC AATGAAGAAT CTGAAAA 4 7 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2647-368 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2647-368 
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(B) LOCATION: 1..23 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2647-368 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 9: 
TTAAGGCCTT CAACTGATTA GACAAGGCCC ACTCACATTA TCTGACA 4 7 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-264 9-107 
(D) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2649-107 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2649-107 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
CACAACTCTG GAGCCTTTTA TGAACAGGAC AGCAATGCAC TGAAACT 4 7 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY; LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2103-270 

(B) LOCATION: 1..4 7 

(D) OTHER INFORMATION: variant version ot" SEQ IDl 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c; g in SEQ IDl 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2103-270 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2103-270 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 



CTTGGATTCA TATGAGACAG CTACCAGACC TTCAATTTTT CTACACT 4 7 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

IB) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
ID) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2228-301 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID2 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID2 
(ix) FEATURE: 

(A) N7VME/KEY: Potential microsequencing oligo 99-2228-301 

(B) LOCATION: 1,.23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2228-301 

(B) LOCATION: complement 25.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 



CCCTGCTTAT CCCTGTAAGG TGGGGACCCA TATGGGCAAG GCCAGAC 17 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: y- 

(A) LENGTH: 41 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2229-240 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID3 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; g in SEQ ID3 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2229-240 

(B) LOCATION; 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2229-240 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 



TCGTCATCGT GGCCTGGGCT ACATACTACC TGTTCCAGTC CTTCCAG 4 7 



(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-22^0-281 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID4 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in'SEQ ID4 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2240-281 

(B) LOCATION: 1,.23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2240-281 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 



GCAATCTTAA TAACTTTTTA TTTTAGTAAT TCGAATCTTT TTTTTCT 4 7 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2242-206 
■(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID5 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID5 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2242-206 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2242-206 

(B) LOCATION: complement 25,. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 



GTGTTTTCTT TTAGTCAAAT TATTTTATAT TTTACTTTTT TCTTAAG 4 7 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

.^(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2244-83 

(B) LOCATION: 1,.47 

(D) OTHER INFORMATION: variant version of SEQ IDG 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID6 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2244-83 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-224 4-83 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 



TAATTGTAGA TACTAAGACC ATTGTGCTTA AACCATGTAG GTACTGA 47 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



wo 99/04038 



38 



PCT/IB98/01193 



(ii) MOLECULE TYPE: DMA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2246-340 
(D) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a.-in SEQ ID7 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2246-340 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2246-340 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 



ATTTATATGT TAAATGCAGA GAAGAAGAAA AATAAGTTTT GCAGTAA 4 7 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2248-76 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ IDS 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ IDS 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2248-76 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2248-76 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 



GACAGAGAGG GAAGGTAATC TTCTCCTGAA GTCTGCCCAT CCCCTGG 4 7 



(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2250-236 

(B) LOCATION: 1..4 7 

(D) OTHER INFORMATION: variant version of SEQ ID9 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID9 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2250-236 

(B) LOCATION: l.,23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2250-236 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 



ATGTATCCAA AACAGAATTA ACATACTTTG GGTTTTTTAT TTTTATT 4 7 



(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2251-151 
(3) LOCATION: 1..47 

(0) OTHER INFORMATION: variant version ot b'EQ IDIO 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base qr a in SEQ IDIO 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2251-151 

(B) LOCATION: l.,23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2251-151 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
TGAAAAGAAG TTCAGACGAT TGCGGATAGA CTAGTTTGGC TGTTGTG 4 7 



(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2269-179 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ IDll 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ IDll 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2269-179 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2269-179 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 



AAAATAAAGA AATTCCTAGA GACGTACAGC CTATCAAGAT CAAACCA 4 7 



(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2271-403 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID12 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID12 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2271-403 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2271-403 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 



AGGCATTTAT TTCATATTTA TTAGCCTTGA TTTTCTTATC TTCAAGT 4 7 



(2) INFORMATION FOR SEQ ID NO: 63: 

( i ) SEQUENCE . CHARACTERISTICS : 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2272-409 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION; variant version of SEQ ID13 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) ,OTHER INFORMATION: base t; g in SEQ ID13 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2272-409 

(B) LOCATION: 1,.23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2272-409 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 



AAAAGCACTG CAATTATTTT GGATACTGTG AAATATTGCA AGTTTTA 4 7 



(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE; 

(A) NAME/KEY: polymorphic fragment 99-2273-528 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID14 

(ix) FEATURE; 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID14 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2273-528 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2273-528 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 



ACTTGAAGAT AAGAAAATCA AGGTTAATAA ATATGAAATA AATGCCT 4 7 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2275-4 66 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID15 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID15 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2275-466 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2275-466 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 



TTGATGATAG CATTAAATAC TCCTAAAAAC TGTGAATAGG GATACTA 4 7 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2270-276 
(U) LOCATION: 

(D) OTHER INFORMATION: variant version of SEQ ID16 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

CD) OTHER INFORMATION: base g; a in SEQ ID16 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2278-276 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2278-276 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 66: 



GAAA.^W\ATG GGAACATCTT CACGGCCTGT GCATCTCCAA CAAGATT 4 7 



(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2312-358 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID17 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID17 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2312-358 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2312-358 

(B) LOCATION: complement 2 5.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 



TTGAAGAGAG AGATGGAAAA AAATGTAGGC CTTCTGGGTA AATGGCC 47 



(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2315-213 

(B) LOCATION: 1. .47 

{D) OTHER INFORMATION: variant version of SEQ ID18 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID18 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2315-213 

(B) LOCATION: l.,23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2315-213 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 



AGATGGATTC TACCCACAGG CAAGAGAAAA CCTTATTTTA AAAATAA 4 7 



(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2320-292 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID19 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

, .(D) OTHER INFORMATION:, - base t; c in SEQ ID19 - 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2320-292 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2320-292 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 



ACTCTCATTC ACTAAACTTC AACTGTTTTT ATAAATTTAA TGAATTT 



(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2321-82 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID20 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID20 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2321-82 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2321-82 

(B) LOCATION: complement 2 5.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 



TAAAGCTTAC TGAGTGTCCA CTCTGGATAC CTACTCAAAT ATTTCCT 4 7 



(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2324-338 

(B) LOCATION: 1,.47 

(D) OTHER INFORMATION: variant version of SEQ ID21 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c; a in SEQ ID21 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2324-338 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2324-338 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 



AGATAGAAGA CAAAATCGCA GGACAAGAAA TCCCTCAACA GTAAAAA 47 



(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2333-423 

(B) LOCATION: 7 

(D) OTHER INFORMATION: variant version of SEQ ID22 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; g in SEQ ID22 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2333-423 

(B) LOCATION: 1. .23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2333-423 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 



GAGACGCTAT CTATGCAAGG AGGTTGTTCA ACATTTGGAC AGCCACG 4 7 



(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2341-485 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID23 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID23 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2341-485 

(B) LOCATION: 1,.23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2341-485 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 73: 



ACACATCTGT CTGTTACCTA CACTTTACAA AGAATCGCAC AGGCTCT 4 7 



(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2342-217 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID24 

(i:<) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID24 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2342-217 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2342-217 

(B) LOCATION: complement 2 5.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 



TAGAGCCTTG GACTTTCATG ACATTTCTAG AAACAGCCCA GATTGTG 4 7 



(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



• 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2362-270 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID25 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: rbase g; a in SEQ ID2S - 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2362-270 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2362-270 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 



TCTCTCTTGG GTGGTTCCTC AACGTGTGTG ACCTTGACCA AGTATTG 4 7 



(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2364-329 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID26 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c; g in SEQ ID26 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2364-329 

(B) LOCATION: 1*.23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2364-329 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 



ATATAAAATG ATGAACCATA TACCTGAGGC AAGGTAACAT ATAATTG 4 7 



(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

Cii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2367-61 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID27 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID27 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2367-61 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2367-61 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 



TAAACATTTC ATTATTTCAG AAAGTAATAT GCATTTTCAC CAACACA 4 7 



(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2371-93 

(B) LOCATION: l.,47 

(D) OTHER INFORMATION: variant version of SEQ ID28 

{ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c; a in SEQ ID28 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oiigo 99-2371-93 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2371-93 

(B) LOCATION: complement 25.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
CTCTAAACTT TCCTAATACT TACCTCACTG CCTACTTTTT ACATAAT 4 7 



(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2378-200 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID29 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID29 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2378-200 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2378-200 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
GAGAACTTCC TGTTGAACCT GTTGTAGAAC TGTCCTGTCG TCCAAGA Al 



(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: Al base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2381-394 

(B) LOCATION: 1,.4 7 

(D) OTHER INFORMATION: variant version of SEQ ID30 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID30 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2381-394 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2381-394 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 



AGTGGTCTTC AGGTTATTGG TAGGGAAAAG TAGGGGAGCT AAAGGTG 4 7 



(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



wo 99/04038 



54 



PCT/IB98/01193 



(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2413-368 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID31 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID3i 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2413-368 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2413-368 

(B) LOCATION: complement 25.. 4"? 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 
ATTTTAAGAG GAAAACTTAA TGGGAGAATT GTACATAATA TTTCATT 4 7 



(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2419-285 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID32 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID32 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2419-285 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY; Potential microsequencing oligo 99-2419-285 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 



AAGGGATCAA GCAGTGCCCA CTCTCCACCC TCCAGGGAGC TGTGACT 47 



(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

.(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2559-253 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID33 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; g in SEQ ID33 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2559-253 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2559-253 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 



CAGGTGTTTT CATGCCCTCT TAGTGTGTGT CACATCATCC ATCTCAA 4 7 



(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2566-112 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of GEQ ID34 

{ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(D) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID34 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2566-112 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2566-112 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 



GCCTTCACAA CCGCAGAGGC AAGGGAAGGA GCTTGGCCAC CCTGACT 4 7 



(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2567-329 

(B) LOCATION: 1..4 7 

(D) OTHER INFORMATION: variant version of SEQ ID35 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; g in SEQ ID35 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2567-329 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2567-329 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 



CACTGTCAGA TATGAAATGA TGCTTGGCTT TCTTTGGGCT ATATTTG 4 7 



(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2570-218 

(B) LOCATION: 1..4 7 

(D) OTHER INFORMATION: variant version of SEQ ID36 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID36 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2570-218 

(B) LOCATION: 1. .23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2570-218 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 



GGAAAGTTCC AAATTATGAG AAGTGAGGCC TCTGAAGTGG CTAAGTT 47 



(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2571-242 
(D) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID37 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a -in SEQ ID37 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2571-242 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2571-242 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 



ATAATGAATG AGTATTTGAT ATTGTATAAT TAAATGTGTC AGCATTT 4 7 



(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2610-121 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID38 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c; a in SEQ ID38 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2610-121 

(B) LOCATION: 1..23 



wo 99/04038 



59 



PCT/IB98/01193 



(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2610-121 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 



ATACCCCTTC CCTACGTATG GCTCTATCCT GCACTTAGAA AATTCTC 4 7 



(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B} TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2615-83 

(B) LOCATION: 1..4 7 

(D) OTHER INFORMATION: variant version of SEQ ID39 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(E) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID39 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2615-83 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2615-83 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 



AACAAATCAC AAGTTGGCAA AAGTAGCAAA TTCTCATCTT CTGGGAA 4 7 



(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2620-227 
(D) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID4 0 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID40 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2620-227 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2620-227 

(B) LOCATION: complement 2 5.. 4 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 



TTGACTGGGC TCCTGATGTG TCCGGGGTAT CTTGCTGGCT GTTTTGC 4 7 



(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2624-407 

(B) LOCATION: 1..44 

(D) OTHER INFORMATION: variant version of SEQ ID41 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; g in SEQ ID41 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2624-407 

(B) LOCATION: 1..23 



wo 99/04038 



61 



PCT/IB98/01193 



(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2624-407 

(B) LOCATION: complement 25.. 44 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 



ATCTGGCCAT ACGCAGAACA TTGTGGGAGA GATGGGGAAA GAGA 4 4 



(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2625-70 

(B) LOCATION: 1..4 7 

(D) OTHER INFORMATION: variant version of SEQ ID42 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 
(D) LOCATION: 24 

CD) OTHER INFORMATION: base g; a in SEQ ID42 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2625-70 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2625-70 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 



AGTGACTCAA CCAGAAAGAG AGCGGGAGAG AGGACGAAGA GAGGAGA 4 7 



(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2630-67 
(3) LOCATION: 1..4 7 

(D) OTHER INFORMATION: variant version of SEQ ID13 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID43 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2630-67 

(B) LOCATION: 1. .23 

-(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2630-67 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 93: 



TAAATTCTGC CTAGAAGATT AAGGTTGGTC CAGAACAGGG AGTGTTT 4 7 



(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2633-129 

(B) LOCATION: 1. .47 

(D) OTHER INFORMATION: variant version of SEQ ID44 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c; a in SEQ ID4 4 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2633-129 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2633-129 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 



TAGCTATTTC TTCCCCTAGG CAACGTAGAC AATGAGAGAA CCCTTGA Al 



(2) INTORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2634-341 

(B) LOCATION; 1..47 

(D) OTHER INFORMATION: variant version of SEQ 10^5 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID4S 
(ix) FEATURE: 

(A) NPJ4E/KEY: Potential microsequencing oligo 99-2634-341 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2634-341 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 



GGAATCAATA TTTATTTATT ATCGACAGGT GAGACATTAT TTATTTA 4 7 



(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2637-20 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEO ID4 6 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID46 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2637-28 

(B) LOCATION: 1 . . 23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2637-28 

(B) LOCATION: complement 25,. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 



CCATCACTTC CTCCTAGTGA AAAGTCAAAG GAGGGTGGGT TTTATAG 4 7 



(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2642-255 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID47 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2642-255 

(B) LOCATION: 1..23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2642-255 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 



TGAGGGTGTT TCCAGAAGAG ACTGGCATTT GAATCTGAAG TGAGTAA 4 7 



{2) INFORMATION FOR SEQ ID NO: 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2645-118 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID48 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; g in SEQ ID48 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2645-118 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2645-118 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 



CACAAATTAA TTGCATTGTT ATATGCTAGC AATGAAGAAT CTGAAAA 4 7 



(2) INFORMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-26^7-308 
(D) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID49 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(D) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ -104 9 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2647-368 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2647-368 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 
TTAAGGCCTT CAACTGATTA GACGAGGCCC ACTCACATTA TCTGACA 4 7 



(2) INFORMATION FOR SEQ ID NO: 100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2649-107 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID50 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; a in SEQ ID50 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2649-107 

(B) LOCATION: 1 . . 23 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-2649-107 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 
CACAACTCTG GAGCCTTTTA TGATCAGGAC AGCAATGCAC TGAAACT 4 7 



(2) INTORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: r- 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ IDl, SEQ ID51 
(0) LOCATION: 1 . . 18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 



CCTGGATTCT GACCCATC 18 



(2) INFORMATION FOR SEQ ID NO: 102: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID2, SEQ ID52 

(B) LOCATION: 1 . . 18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 



TCTACCTCTA CCTCTTTC 



18 
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(2) INFORMATION FOR SEQ ID NO: 103: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID3, SEQ ID53 

(B) LOCATION: 1..19 

(:<i) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 



CTTCCCATAC CTCTGATAC 



(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID4, SEQ ID54 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 



TTCAACAGTG AAGCCATC 



(2) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
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(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ IDS, SEQ ID55 
(D) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: 
TGATGTGTGT GACTCAGG ^- ifl 



(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID6, SEQ ID56 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 
ATAGAGGAAC CAAACCTG 18 



(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID7, SEQ ID57 
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(B) LOCATION: 1..18 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 

AGCAGCATGG AAGCAAAC 18 



(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID8, SEQ ID58 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 
CTGATGAAAG TGGCTCTC 18 



(2) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID9, SEQ ID59 

(B) LOCATION; 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 



TGTATCTGAG GTCTAAAAC 



19 
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(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ IDIO, SEQ"ID60 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 



TATATGTAGA GGGTGAGG 18 



(2) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ IDll, SEQ ID61 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 



AGGCTAAGAA AAAAAGAGG 19 



(2) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID12, SEQ ID62 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 
TGAAAAGACT AAGTTCTGG 19 



(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID13, SEQ ID63 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 
ATGCTAGAGG AAAGGAAC 18 



(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID14, SEQ ID64 

(B) LOCATION: 1..18 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 
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ATACCAGGGA CTTTAGTG 



(2) INrORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID15, SEQ ID65 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 
AGATTCAGAC CAATTTCAC 19 



(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID16, SEQ ID66 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 
TGCTTTGATT TGACCCTG 18 



(2) INFORMATION FOR SEQ ID NO: 117: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LEtQGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification orimer for SEQ ID17, SEQ ID67 
(D) LOCATION: 1,.19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11^: 



GCCTATCTTG TTTTGACTG 



(2) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
CD) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) N7\ME/KEY: upstream amplification orimer for SEQ ID18, SEQ IDcS 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 



TTCAGAGCAA CAATTTTGG 



(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID19, SEQ ID69 

(B) LOCATION: 1 . . 20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 
CCAAGTTTAT GAGATTAGAG 20 



(2) INFORMATION FOR SEQ ID NO: 120: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID20, SEQ ID70 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 
CTAACCTAGA TGATCTTCC 19 



(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID21, SEQ ID71 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 



TGTCCCAAGT TTAGTTCC 



18 
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(2) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens - 

(i.x) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID22, SEQ ID72 

(B) LOCATION: l.,21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 



CCAGGAATAA TACTTTGCAT C 



(2) INFORMATION FOR SEQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID23, SEQ ID73 

(B) LOCATION: 1,.19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 



CTCAGTTTTT CTTTCCACC 



(2) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
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(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID24, SEQ ID7/1 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12A: 
GACTCAGGCA CAACTTTTAG y. OQ 



(2) INFORMATION FOR SEQ ID NO: 125: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID25, SEQ ID75 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 
TACAGCAATG GTATAAAGC iq 



(2) INFORMATION FOR SEQ ID NO: 126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID26, SEQ ID76 
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(B) LOCATION: 1..20 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 

TTATCCATCA TTTAGAAGGC 



(2) INFORMATION FOR SEQ ID NO: 127: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS-: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID27, SEQ ID77 

(B) LOCATION: 1 . , 18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 
CACTGGAGAT AGCTGAAC 18 



(2) INFORMATION FOR SEQ ID NO: 128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY; LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID28, SEQ ID78 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 



GTACTGTCAA ATCATCACC 



19 
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{2) INFORMATION FOR SEQ ID NO: 129: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for-'SEQ ID29, SEQ ID79 
CD) LOCATION: 1. .18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129: 



CGGGCATAAA AATGCAGG 



(2) INFORMATION FOR SEQ ID NO: 130: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID30, SEQ ID80 

(B) LOCATION: l.,20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130: 



GTATATGTGA AGGTTGTGGG 



(2) INFORMATION FOR SEQ ID NO: 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID31, SEQ IDOl 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 
GTAAGATGTG ACTTGCTCC 19 



(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID32, SEQ ID82 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 
CCAGCTTGAA TTTTGGTGAG 20 



(2) INFORMATION FOR SEQ ID NO: 133: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID33, SEQ ID83 

(B) LOCATION: 1 . . 18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: 
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GCATATCTTG GTGGTCTG 



(2) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA'. 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID3/1, SEO ID8-i 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 
AGGGTTCAAA GGAAGGAGG 19 



(2) INFORMATION FOR SEQ ID NO: 135: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID35, SEQ ID85 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135: 
GAAAAAGAAG GGAAAGAAAG 20 



(2) INFORMATION FOR SEQ ID NO: 136: 
(i) SEQUENCE CHARACTERISTICS: 
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{A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID36, SEQ ID86 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID-NO: 136: 
GTTTGTCTTG GCTATTAAG 19 



(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM; Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID37, SEQ ID87 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 
TGAAAAAGTG GGTAGCAG 13 

(2) INFORMATION FOR SEQ ID NO: 138: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID38, SEQ ID38 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 
ATATCAGGGC AGGCACAAG 19 



(2) INFORMATION FOR SEQ ID NO: 139: 

(i) SEQUENCE CHARACTERISTICS: - 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID39, SEQ ID89 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: 
GGAAGAGGGC AACTTTAC 18 



(2) INFORMATION FOR SEQ ID NO: 140: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID40, SEQ ID90 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 



TGAAATGGGC TGTAGATG 



18 
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(2) INFORMATION FOR SEQ ID NO: 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID4 1, SEQ ID91 

(B) LOCATION: l.,18 

(:<i) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 



TTAAACCTTG GCTTCCTG 



(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A} ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID42, SEQ ID92 

(B) LOCATION: l.,18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 



TTCAACCTTT TGTCGCTG 



(2) INFORMATION FOR SEQ ID NO: 143: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 



wo 99/04038 



PCT/IB98/01193 



85 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID43, SEQ ID93 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: 
ATGTAACAGA TGTCCAAAG r- r- 19 



(2) INFORMATION FOR SEQ ID NO: 144: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID44, SEQ ID94 

(B) LOCATION: l.,18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144: 
CTAAGGGTCT TCTTTCTG 18 



(2) INFORMATION FOR SEQ ID NO: 145: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 
CO STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID4 5, SEQ ID95 
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(B) LOCATION: 1..19 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145: 

GGTGTATTTA GGTTTGTGG 19 



(2) INFORMATION FOR SEQ ID NO: 14 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID46, SEQ ID9d 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 
CTACCATCAC TTTCCTCC 18 



(2) INFORMATION FOR SEQ ID NO: 147: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID47, SEQ ID97 

(B) LOCATION: 1 . . 18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 



ATAACTAGGC ATCCAGAC 



18 
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(2) INFORMATION FOR SEQ ID NO: 14 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID48, SEQ ID98 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: 
CGACATAATT TGGTATGTAG 20 

(2) INFORMATION FOR SEQ ID NO: 14 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID49, SEQ ID99 

(B) LOCATION: l.,18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149: 
TCACCAAGTG TCATCGTC 18 

(2) INFORMATION FOR SEQ ID NO: 150: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID50, SEQ 

IDIOO 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150: 

GAGACTTTGT AACTTTGTG 19 



(2) INFORMATION FOR SEQ ID NO: 151: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOfOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix] FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ IDl, SEQ 

ID51 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151: 

GTCTTCATAA GTCTTCAGTG 20 



(2) INFORMATION FOR SEQ ID NO: 152: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID52 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID2, SEQ 



wo 99/04038 



89 



PCT/IB98/01193 



(B) LOCATION: 1 . . 18 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152: 

CAAAACACTC CCTCACAC 18 



(2) INFORMATION FOR SEQ ID NO: 153: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID53 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID3, SEQ 

(B) LOCATION: 1..18 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 153: 

CAGGTGATGT CTGGATAC 18 

(2) INFORMATION FOR SEQ ID NO: 154: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID54 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID4, SEQ 

(B) LOCATION: 1. .21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: 



AAGACAACAA GAACTAAATC C 



21 
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(2) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID55 



(ix) FEATURE: 

(A) NAME/KEY; downstream amplification primer for SEO IDS, SEQ 

(B) LOCATION: 1 . . 20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: 

TCCCCAATAG ATTAAAGTTC 20 

(2) INFORMATION FOR SEQ ID NO: 156: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID56 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID6, SEQ 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156: 

CTGAGCATCA AATAGGAG 18 



(2) INFORMATION FOR SEQ ID NO: 157: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstrecim amplification primer for GEO ID7, SEQ 

ID57 

(D) LOCATION: 1..21 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157: 



TCATTACAGA AAAAGCCAAA G 



(2) INFORMATION FOR SEQ ID NO: 158: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID58 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID8, SEQ 

(B) LOCATION: 1 . . 20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158: 

TCCTTCTCCA CCTAAAATTC 20 



(2) INFORMATION FOR SEQ ID NO: 159: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID9, SEQ 

ID59 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159: 

ACTGCTTCTG CTCTCTTG 18 

(2) INFORMATION FOR SEQ ID NO: 160: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID60 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ IDIO, SEQ 
CB) LOCATION: 1..20 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160: 

TGAACATACA /^JiAACACTGG 20 



(2) INFORMATION FOR SEQ ID NO: 161: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ IDll, SEQ 

(B) LOCATION: 1..18 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161: 



ID61 
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AGAGTTGTTG GCATGTAG 19 



(2) INFORMATION FOR SEQ ID NO: 162: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID62 



(ix) FEATURE; 

(A) NAME/KEY: downstream amplification primer for SEQ ID12, SEQ 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162: 

AACTGCTCAG CAACTGTG 18 



(2) INFORMATION FOR SEQ ID NO: 163: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID63 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID13, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 163: 

TTAGAACACT TTTATGGGAA C 21 



(2) INFORMATION FOR SEQ ID NO: 164: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID14, SEQ 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 164: 

GTCCTAGAAT GAGCAAATG 19 



ID64 



(2) INFORMATION FOR SEQ ID NO: 165: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID65 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID15, SEQ 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165: 



AGAGAAAGAA CCAGAGCC 18 



(2) INFORMATION FOR SEQ ID NO: 166: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID16, SEQ 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166: 

TGGAGTCTAA ACTAGGTG 18 



(2) INFORMATION FOR SEQ ID NO: 167: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID17, SEQ 

ID67 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167: 

GGACCTTTTA AGAGTGTG 18 



(2) INFORMATION FOR SEQ ID NO: 168: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE; 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 
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(A) NAME/KEY: downstream amplification primer for SEQ ID18, SEO 

ID68 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168: 

TGGTTTCTTC AAACAAGAG 19 

(2) INFORMATION FOR SEQ ID NO: 169: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID69 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID19, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169: 

AAGTTGGATA ACCTTCTTTT G 21 



(2) INFORMATION FOR SEQ ID NO: 170: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID70 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID20, SEQ 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 170: 
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TAGTTTCGTG AACTTATCC 19 



(2) INFORMATION FOR SEQ ID NO: 171: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID71 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID21, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171: 

GTTTACATTA TGCCCCTTTT C 21 



(2) INFORMATION FOR SEQ ID NO: 172: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID72 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID22, SEQ 

(B) LOCATION: 1. .18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: 

CTCCACTGCC ACAACTTC 18 

(2) INFORMATION FOR SEQ ID NO: 173: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID23, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 173: 

TGCTCTGCTT GTAATGTTAT G 21 



ID73 



(2) INFORMATION FOR SEQ ID NO: 174: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID74 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID24, SEQ 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174: 

CAAGGTTGCC AGTCACATC 19 



(2) INFORMATION FOR SEQ ID NO: 175: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID25, SEQ 

ID75 

(D) LOCATION: 1..18 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175: 

ATGAAGATAC GCAGCCAG 18 

(2) INFORMATION FOR SEQ ID NO: 17 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID76 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID26, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 176: 

CTCATTTAAC TCCCATTCCT C 21 



(2) INFORMATION FOR SEQ ID NO: 177: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

FEATURE : 

(A) NAME/KEY: downstream amplification primer for SEQ ID27, SEQ 

(B) LOCATION: 1..21 



(ix) 

ID77 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 177: 
TGCTTTTCTT GTCCCTGATT G 21 

(2) INFORMATION FOR SEQ ID NO: 178: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID26, SEQ 

ID78 

(B) LOCATION: 1 . , 20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178: 

GCATTGAATC CGTAAATTTC 20 

(2) INFORMATION FOR SEQ ID NO: 179: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID79 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID29, SEQ 

(B) LOCATION: l.,21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179: 

CAGTTTTGGT CATTGTGGGA G 21 
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(2) INFORMATION FOR SEQ ID NO: 180: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID80 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID30, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180: 

AAATCCAACT ATGTCACTTC C 21 



(2) INFORMATION FOR SEQ ID NO: 181: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 
{C) STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID81 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID31, SEQ 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 181: 

AATGTCCCCT CCTCCTCTG 19 



(2) INFORMATION FOR SEQ ID NO: 182: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 



wo 99/04038 



102 



PCT/IB98/01193 



(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID32, SEQ 

ID82 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182: 

r- r- 
GCCACAAGTA TTTGGGTGCC 20 



(2) INFORMATION FOR SEQ ID NO: 183: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID83 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID33, SEQ 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183: 

CCTACGGTTT GTCATAAAG 19 



(2) INFORMATION FOR SEQ ID NO: 184: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID34, SEQ 

ID84 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184: 

TGTAACAGGG GACATGGGAA G 21 



(2) INFORMATION FOR SEQ ID NO: 185: 

(il SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID35, SEQ 

ID85 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185: 

CAATTTTGTA TGGATGACAG 20 

(2) INFORMATION FOR SEQ ID NO: 18 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID86 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID36, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186: 
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TGGTGGTGGA AAAAAAGAAG G 21 



(2) INFORMATION FOR SEQ ID NO: 187: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID87 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID37, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 187: 

CTATAACTCT TATCAGTGAA C 21 



(2) INFORMATION FOR SEQ ID NO: 188: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID88 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID38, SEQ 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18^8: 

AGGTCACTCA AGTATTATGG 20 



(2) INFORMATION FOR SEQ ID NO: 189: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID39, SEO 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 189: 

CCCCAGCTCC CAAATAATGA C 21 



ID89 



(2) INFORMATION FOR SEQ ID NO: 190: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID40, SEQ 

ID90 

(B) LOCATION: 1..20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190: 



TCCACAACAG ACACTTAAAC 



(2) INFORMATION FOR SEQ ID NO: 191: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID41, SZQ 

ID91 

(B) LOCATION: 1..18 

(xi} SEQUENCE DESCRIPTION: SEQ ID NO: 191: 

TCTCTTTCCC CATCTCTC 18 



(2) INFORMATION FOR SEQ ID NO: 192: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID42, SEQ 

ID92 

(B) LOCATION: 1..19 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192: 



TCCCCTTCTA TTGTCTACC 



(2) INFORMATION FOR SEQ ID NO: 193: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 



ID93 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID4 3, SEQ 
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(B) LOCATION: 1..18 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 193: 

GGTTTGTGTT CAGTACGG 18 



(2) INFORMATION FOR SEQ ID NO: 194: 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID94 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID44, SEQ 

(B) LOCATION: 1. .21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194: 

TGTATATGCC TGGTGGAAAT G 21 

(2) INFORMATION FOR SEQ ID NO: 195: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID95 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID45, SEQ 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195: 



GTGAAAGAAA CTTGATAGAG G 



21 
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(2) INFORMATION FOR SEQ ID NO: 196: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID96 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID^6, SEQ 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196: 



CCTCCAACAG TAAGAATC 18 



(2) INFORMATION FOR SEQ ID NO: 197: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 
CO STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID97 



(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID47, SEQ 

(B) LOCATION: 1,.20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 197: 

CAGAACCATT AACTATTCAC 20 



(2) INFORMATION FOR SEQ ID NO: 198: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ IDAS, SEQ 

ID9G 

(B) LOCATION: 1..20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 198: 



GCCATTTGGA ATTTTGATAG 



(2) INFORMATION FOR SEQ ID NO: 199: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



ID99 



(ix| FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID4 9, SEQ 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199: 

TGCAGCATCC CTGGAAGTC 19 



(2) INFORMATION FOR SEQ ID NO: 200: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(ix) FEATURE: 

(A) NAME/KEY; downstream amplification primer for SEQ ID50, SEQ 

IDIOO 

(B) LOCATION: 1..21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 200: 

GAGACATCAT ATCTGTGTTT G 21 



(2) INFORMATION FOR SEQ ID NO: 201: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY; microsequencing oligo 99-2103-270 .misl 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201: 
GATTCATATG AGACAGCTA 19 



(2) INFORMATION FOR SEQ ID NO: 202: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2228-301 .misl 

(B) LOCATION: 1 . . 23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 202: 
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CCCTGCTTAT CCCTGTAAGG TGG 



(2) INFORMATION FOR SEQ ID NO: 203: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2229-240.misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 203: 



TCGTCATCGT GGCCTGGGCT ACA 



(2) INFORMATION FOR SEQ ID NO: 204: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

{B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2240-281 .misl 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 204: 



TCTTAATAAC TTTTTATTT 



(2) INFORMATION FOR SEQ ID NO: 205: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) STEU^NDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2242-206 . misl 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 205: 
TTTCTTTTAG TCAAATTAT 



(2) INFORMATION FOR SEQ ID NO: 206: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2244-83. misl 

(B) LOCATION: l.,23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206: 



TAATTGTAGA TACTAAGACC ATT 



(2) INFORMATION FOR SEQ ID NO: 207: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



(ix) FEATURE: 
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(A) NAME/KEY: potential microsequencing oligo 99-224 6-340 . misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: 
ATTTATATGT TAAATGCAGA GAA 23 

(2) INFORMATION FOR SEQ ID NO: 208: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2248-76. misl 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208: 
GAGAGGGAAG GTAATCTTC 19 

(2) INFORMATION FOR SEQ ID NO: 209: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2250-236 , misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 209: 



TTTTATCCAA AACAGAATTA ACA 



23 
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(2) INFORMATION FOR SEQ ID NO: 210: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: -LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2251-151 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210: 



TGAAAAGAAG TTCAGACGAT TGC 23 



(2) INFORMATION FOR SEQ ID NO: 211: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2269-179 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211: 



AAAATAAAGA AATTCCTAGA GAC 23 



(2) INFORMATION FOR SEQ ID NO: 212: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2271-'103 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212: 
AGGCATTTAT TTCATATTTA TTA 23 



(2) INFORMATION FOR SEQ ID NO: 213: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2272-409 ,misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 213: 
AAAAGCACTG CAATTATTTT GGA 23 



(2) INFORMATION FOR SEQ ID NO: 214: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 
CO STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2273-528 . misl 

(B) LOCATION: 1..19 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 214: 
GAAGATAAGA AAATCAAGG 19 

(2) INFORMATION FOR SEQ ID NO: 215: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2275-4 66 .misl 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 215: 
TGATAGCATT AAATACTCC 19 

(2) INFORMATION FOR SEQ ID NO: 216: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2278-27 6 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 216: 
GAAAAAAATG GGAACATCTT CAC 23 



(2) INFORMATION FOR SEQ ID NO: 217: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2312-358 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 217: 



TTTTAGAGAG AGATGGAAAA AAA 23 



(2) INFORMATION FOR SEQ ID NO: 218: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2315-2 13 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 218: 



AGATGGATTC TACCCACAGG CAA 23 



(2) INFORMATION FOR SEQ ID NO: 219: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 
(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Homo sapiens 
(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2320-292 . misl 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 219: 
TCATTCACTA AACTTCAAC 



(2) INFORMATION FOR SEQ ID NO: 220: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2321-82 . misl 

(B) LOCATION: 1. .19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: 
GCTTACTGAG TGTCCACTC 



(2) INFORMATION FOR SEQ ID NO: 221: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE; NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2324-338 .misl 

(B) LOCATION: 1 . . 19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221: 
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AGAAGACAAA ATCGCAGGA 



(2) INFORMATION FOR SEQ ID NO: 222: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

IB) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2333-423. misl 
{B} LOCATION: 1,.23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 222: 



GAGACGCTAT CTATGCAAGG AGG 



(2) INFORMATION FOR SEQ ID NO: 223: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-234 1-4 85 . misl 

(B) LOCATION: 1,.23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 223: 



TTTTATCTGT CTGTTACCTA CAC 



(2) INFORMATION FOR SEQ ID NO: 224: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

{ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE; 

(A) NAME/KEY: microsequencing oiigo 99-23^2-217 .misl 
(D) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22^: 

r- 

TTTTGCCTTG GACTTTCATG ACA 2 3 



(2) INFORMATION FOR SEQ ID NO: 225: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oiigo 99-2362-270 .misl 

(B) LOCATION; 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 5: 



TCTCTCTTGG GTGGTTCCTC AAC 



(2) INFORMATION FOR SEQ ID NO: 226: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 
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(A) NAME/KEY: inicrosequencing oligo 99-23G1 -329 . mis 1 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 226: 



AAAATGATGA ACCATATAC 



CM INKOKMATION FOR SEQ ID NO: 227: 

(L) i;i-.:OUENCE CHARACTERISTICS: 

(A) LENGTH: 23 Uuui pair:r^ 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: potential microsequonci oligo 99-2367- G 1 . mi.'jl 

(B) LOCATION: 1..23 

(:':i) SEQUENCE DESCRIPTION: SEQ ID NO: 227: 



TA.^vACATTTC ATTATTTCAG AAA 



(::) INFORMATION FOR SEQ ID NO: 228: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2371-93 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: 



TTTTAAACTT TCCTAATACT TAG 



23 
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(2) INFORMATION FOR SEQ ID NO: 229: 

(i) GEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESr^: SINGLE 

(D) TOrOLOCY: LINEAR 

( i i.) MOLECULE TYPE: DNA 

(vi ) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapionii 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequenci no oli.jo 09-2 37y-:.*00 . mi -7 1 

(B) LOCATION: 1..23 

SEQUENCE DESCRIPTION: SEQ ID NO: 229: 
(:;AGAy\CTTCC TGTTGAACCT GTT 2 3 

(2) INFORMATION FOR SEQ ID NO: 230: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-238 1-394 . misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 230: 
AGTGGTCTTC AGGTTATTGG TAG 2 3 

(2) INFORMATION FOR SEQ ID NO: 231: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) OKIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potiontial microiu-qucnointj oliqo M- UiR . mi a 1 

(B) LOCATION: 1..23 

(XL) :;E0UENCE description: SEO id NO: 231: 



attttaa(:;ag ciaaaacttaa tgc 



(2) information FOR SEQ ID NO: 232: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencincj oligc : ^J-^H**) . mi:.: 1 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232: 



GATCAAGCAG TGCCCACTC 



(2) INFORMATION FOR SEQ ID NO: 233: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2559-253 . misl 

(B) LOCATION: 1..23 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233: 
CAGGTGTTTT CATGCCCTCT TAG 2 3 

(::) mrOKMATlON for sec id NO: 23*1: 

( I ) :;i-XHJENCE CHARACTERISTICS : 

(A) LENGTH: 23 base p.iirs 
{IM TYPE: NUCLEIC ACID 
{C) STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi } ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i;<) FEATURE: 

(A) NAME/KEY: potential rnicrosequencing oliv.jo 99-2 1j bb- 1 i 2 . ;ni:J 1 

(B) LOCATION: 1..23 

(>:i} SEQUENCE DESCRIPTION: SEQ ID NO: 23'1 : 
GCCTTCACAA CCGCAGAGGC AAG 2 3 

(2) rNFORMATIOM FOR SEQ ID NO: 235: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(F3) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential rnicrosequencing oligo 99-2567-329 . mis 1 

(B) LOCATION: 1. .23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235: 
CACTGTCAGA TATGAAATGA TGC 2 3 



(2) INFORMATION FOR SEQ ID NO: 236: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNE35: SINGLE 
(0) TOPOLOGY: LINEAR 

( i i ) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE; 

(A) ORGANISM: Homo sapiens 

{];<) FEATURE: 

(A) NAME/KEY: mictosoquoncintj oiiqo 9*i->!:i7()-:^ 1 8 . miii 1 
(H) LOCATION: l..l<) 

(:<i) SEQUENCE DESCRIPTION: SEQ ID NO: 23G: 



AGTTCCAAAT TATGAGAAG 



Ci') INFORMATION FOR SEQ ID NO: 237: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 
(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(Li) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: potential microseauencing olicjo 99-2571-2^ 2 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 237: 



ATAATGAATG AGTATTTGAT ATT 



(2) INFORMATION FOR SEQ ID NO: 238: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 
(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Homo sapiens 
(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2 6 1 0- 1 2 1 . nii^ 1 

(0) LOCATION: 1 . .23 

;^KOnENCE DESCRIPTION: i^EQ ID NO: 2'M\: 
TTTTCCCTTC CCTACGTATG GCT 



{'/) INFORMATION FOR SEQ ID NO: 239: 

(i) SEOUENCE CHARACTERISTICS: 

{A) LENGTH: 23 base pairs 

{U) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oiigo 99-261 5-8 3 . mis 1 

(B) LOCATION: 1. .23 

(:-:L) SI-:QUENCE DESCRIPTION: SEQ ID NO: 239: 
TTTTAATCAC AAGTTGGCAA AAG 2 3 



(2) INFORMATION FOR SEQ ID NO: 240: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2620-227 . misi 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240: 
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TTGACTGGGC TCCTGATGTG TCC 23 



C:) INFORMATION FOR SEQ ID NO: 2^11: 

(i) r:FOUR:NCE CHARACTERISTICS: 

(A) LENGTH: 23 b.i.so p.iirs 

(h) TYPE: NUCLEIC ACID 

(C) nTRANDEDNESi; : SINGLE 

(U) TOrOLOCY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential rnicrosequenci ruj oii.jo 99-262 ^ 07 . mis 1 
in) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2/11: 
ATCTGGCCAT AGGCAGAACA TTG 23 



(2) INFORMATION FOR SEQ ID NO: 242: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2625-*70 . misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 242: 
AGTGACTCAA CCAGAAAGAG AGC 2 3 



(2) INFORMATION FOR SEQ ID NO: 243: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vl) (JKIGINAL GOIJRCE: 

(A) ORGANISM: Homo r.apien;? 

(ix) f-KATURE: 

(A) NAME/KEY: pulentiai micro:3oqtionci luj oliqt) J0-(;7 . mi:: 1 

(iO LOCATION: 1..23 

SKOUENCE DESCRIPTION: lUZQ ID NO: 2-13: 
TAAATTCTGC CTAGAAGATT AAG 2 J 



{■.i) INFORMATION FOR SEQ ID NO: 2nA: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) I-:OLECULE TYPE: DNA 

(vL) ORIGINAL SOURCE: 

:A) ORGANISM: Homo sapiens 

(L:-:) F^:ATURE: 

(A) NAME/KEY: microsequencing oligo 9'i-20 3 3- 1 2 9 . rni.-; 1 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2nA : 



TTTTTATTTC TTCCCCTAGG CAA 



(2) INFORMATION FOR SEQ ID NO: 24 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 
(CI STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 
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(A) NAME/KEY: potential microsequencing oligo 99-263h-34 1 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2-15: 



c;gaatcaata TTTATTTATT ATC 



{'/) INFORMATION FOR SEQ ID NO: 2Ai\: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
(0) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: * 
(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: potential inicrosequencir. ] oiigo ^)9-2637-:^y . mi j I 

(D) LOCATION: 1..23 

(;<i) SEQUENCE DESCRIPTION: SEQ ID NO: 216: 



CCATCACTTC CTCCTAGTGA AAA 



(2) INFORMATION FOR SEQ ID NO: 2A1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-260 2-255 . mis 1 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 247: 



TGAGGGTGTT TCCAGAAGAG ACT 
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(2) INFORMATION FOR SEQ ID NO: 248: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(H) TYPE: NUCLEIC ACID 

(C:) STRANDEDNES5: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: ONA 

(vi ) OUIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencincj olicjo 99-204 [j- 1 1 U . mi s 1 

[B) LOCATION: 1..23 

(:<i) SEQUENCE DESCRIPTION: SEQ ID NO: 248: 



CACAAATTAA TTGCATTGTT ATA 



(2) INFORMATION FOR SEQ ID NO: 249: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-264 7-368 .misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 249: 



TTAAGGCCTT CAACTGATTA GAC 



(2) INFORMATION FOR SEQ ID NO: 250: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i>:) l-'EATURE: 

(A) NAME/KEY: micro;-;Gquoncincj olitjo *)*)-:.Mi.; ')- 107 . m i:; I 

(B) LOCATION: 

(xi) i;EOUENCE DESCRIPTION: SEQ ID NO: 'A'jO: 
ACTCTOGAGC CTTTTATGA 1 9 



(2) INFORMATION FOR SEQ ID NO: 251: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: ilomo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequonc i ru) oiijo 9*J-2 1 01-270 . ini:i2 

(B) LOCATION: 1 . .23 

(:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 231: 
AGTGTAGAAA AATTGAAGGT CTG 



(2) INFORMATION FOR SEQ ID NO: 252: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM; Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2228-301 . mis2 

(B) LOCATION: 1..19 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252: 
GGCCTTGCCC ATATGGGTC 



(;:) CNTOKMATION FOR SEQ ID NO: 233: 

( i.) :;E0UENCE CIIARACTERIGTICr>: 

(A) LENGTH: 19 baao pairs 
ID) TYPE: NUCLEIC ACID 
(C) iVrRANDEDNESl}: INGLE 
ID) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo s.ipiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing olicjo 9*.)-2220-24 0 . mi^;2 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 253: 
AAGGACTCGA ACAGGTAGT 



(2) INFORMATION FOR SEQ ID NO: 25-1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-224 0-28 1 . mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 254: 
AGAAAAAAAA GATTCGAATT ACT or> 



(2) INFORMATION FOR SEQ ID NO: 255: 
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(il SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

( i i) MOLECULE TYPE: DNA 

(v.L) OKtCINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

( i :0 FEATURE: 

(A) NAME/KEY: potontiji microscquorjcin^i olitjo 'I :!-:^0() . ini ;.;2 

iW) LOCATION: 1..23 

(x.i) SEQUENCE DESCRIPTION: SEQ ID NO: 255: 



CTTAAGATVAA AAGTAAAATA TAA 



(2) INFORMATION FOR SEQ ID NO: 256: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

{ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:<) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-22'H -8 3 . mi s2 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 256: 



TACCTACATG GTTTAAGCA 



(2) INFORMATION FOR SEQ ID NO: 257: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 
(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Homo sapiens 
(ix) FEATURE: 

(A) NAME/KEY: inicrosequencinq oligo 90-22'l 0 . mii^2 

(H) LOCATION: 1 . . 19 

(::.L) ^lEOUENCE DESCRIPTION: SEQ ID NO: 257: 
T(;CAAAACTT ATTTTTCTT 



(::) iNFORMATION TOR SEQ ID NO: 258: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(b) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: potential microsequencinci oliqo 9')-22^1 b-7(3 . mis2 

(B) LOCATION: 1..23 

(:<i) SEQUENCE DESCRIPTION: SEQ ID NO: 2138: 
CCAGGGGATG GGCAGACTTC AGG 2 3 



(2) INFORMATION FOR SEQ ID NO: 259: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2250-236 .mis2 

(B) LOCATION: 1. .23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 259; 
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AATAAAAATA AAAAACCCAA AGT 



{'<:) INFORMATION FOR SEQ ID NO: 260: 

(i.) :;i-:OUFNCE CllARACTERU^TICil: 

(A) LENGTH: 23 ba.se pairs 
m TYPE: NUCLEIC ACID 
(C) riTRANDEDNEGU: SINGLE 
{!.)) TOPOLOGY: LINEAR 

(il) MOLECULE TYPE; DNA 

(v:L) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: inicrosequeacing olicjo '.J')-2251-151 .inis2 

(B) LOCATION: 1..23 

SEQUENCE DESCRIPTION: SEQ ID NO: 260: 



TTTTACAGCC A.^.^CTAGTCT ATC 



(2) INFORMATION FOR SEQ ID NO: 261: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(H) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 59-2269-179 .mis2 

(B) LOCATION: l.,19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261: 



TTGATCTTGA TAGGCTGTA 



(2) INFORMATION FOR SEQ ID NO: 262: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) 5TRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(iil MOLECULE TYPE: DNA 

(vi} OKIGINAL GOURCE: 

(A) ORGANISM: liomo liiapions 

(ix) FEATURE: 

(A) NAME/KEY: inioroi'ajquoiicinq olicjo 'V)-22'n'Ai).^.\i\xi'>2 

(B) LOCATION: 1..!!) 

{:<\) i:KOUENCE DEllCUirTlON: SEQ ID NO: 2(>2: 
GAAGATAAGA AAATCAAGG 1 \) 



{2) INFORMATION FOR SEQ ID NO: 263: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 
(U) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

( i:-:) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2272-100 . inis2 

(B) LOCATION: 1..23 

(:<i) SEQUENCE DESCRIPTION: SEQ ID NO: 263: 



TTTTACTTGC AATATTTCAC AGT 23 



(2) INFORMATION FOR SEQ ID NO: 264: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 
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(A) NAME/KEY: potential microsequencing oligo 99-227 3- 528 . mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261: 



a(.":oc:atttat ttcatattta tta 2:\ 



(;:) information for GEQ id NO: 265: 

(L) :;kouence characteristics: 

(A) LENGTH: 23 bai;c pairs 

. (D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:), FEATURE: 

(A) NAME/KEY: potential tnicrosoquoncino oli'.jo 99-227 66 . !ni£;2 

(B) LOCATION: 1..23 

(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 265: 



TAGTATCCCT ATTCACAGTT TTT 2 3 



(2) INFORMATION FOR SEQ ID NO: 266: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2278-276 . mis2 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 266: 



TTGTTGGAGA TGCACAGGC 



19 
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(2) INFORMATION FOR SEQ ID NO: 267: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) 5TRANDEDNE5S: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii.) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo ij.ipion:^ 

(i:-:) FEATURE: 

(A) NAME/KEY: potential rnicrosoquencinq oliqo 9'.)-2 3 1 2- 358 . mi::i2 

(B) LOCATION: 1..23 

(:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 267: 



G(.";CCATTTAC CCAGAAGGCC TAC 2 3 



(2) INFORMATION FOR SEQ ID NO: 2Gll: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
(D} TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2315-2 13 . mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 268: 



TTTTTTTTAA AATAAGGTTT TCT 



(2) INFORMATION FOR SEQ ID NO: 269: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo Scipiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microi?equonci tiM olLrio \)9~2320''//.yA.ix\is:: 

(B) LOCATION: 1..23 

(x.i) r.EUUENCE DESCRTTTION: I^EQ ID NO: 269: 



AAATTCAT'i'A AATTTATAAA AAC 



{2) INFORMATION FOR SEQ ID NO: 270: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
([)} TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microscouoncir.-.j oli^jo *)9-232 1-82 .mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 270: 



AGGAAATATT TGAGTAGGTA TCC 2 3 



(2) INFORMATION FOR SEQ ID NO: 271: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2324-338 . mis2 

(B) LOCATION: 1..23 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 271: 
TTTTTACTGT TGAGGGATTT CTT 



C^) [NFOUMATION FOR 5EQ ID NO: 272: 

(1) i;h:OUENCE CIlARACTKRinTICG: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) ;rrRANDEDNE::S: r>INGLE 
{[)) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-233 3- '1 23 . mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 272: 
TTTTGCTGTC CAAJVTGTTGA ACA 



(2) INFORMATION FOR SEQ ID NO: 27 3: 

(i) SEQUENCE CIIARACTERISTICS : 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-234 1-405 . mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 3: 
AGAGCCTGTG CGATTCTTTG TAA 2 3 



(2) INFORMATION FOR SEQ ID NO: 274: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANOEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

( i i) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:<) FEATURE: 

(A) NAME/KEY: potential inicroi^oquonci rui oli^jo *M)-2 J1:^-:M7 .mi:j2 

(B) LOCATION: 1..23 

(:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 21^: 
CACAATCTGG CCTGTTTCTA GAA o < 



(2) INFORMATION FOR SEQ ID NO: 275: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2362-270 , mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 275: 
TTTTACTTGG TCAAGGTCAC ACA 2 3 



(2) INFORMATION FOR SEQ ID NO: 276: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Homo sapiens 
(ix) FEATURE: 

(A) NAME/KEY: potential microsequencincj oligo 99-23CM -32^ . mi52 
(D) LOCATION: 1,.23 

:^EQUir.NCE DEHCRI PTION : SEO ID NO: 270: 
CAATTATATO TTACCTTGCC TCA y i 



{::) INFORMATION EOR SEO ID NO: 277: 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequenciny oligo 9^-2367-01 . mLs2 

(B) LOCATION: l.,23 

(:-:i) SEOUENCE DESCRIPTION: SEQ ID NO: 277: 
TTTTTTGGTG AAJ\ATGCATA TTA 



(2) INFORMATION FOR SEQ ID NO: 278: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2371-93 . mis2 

(B) LOCATION: 1..23 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 278: 
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4 

ATTATGTAAA AAGTAGGCAG TGA 23 



{2) INTORMATION FOR SEQ ID NO: 27 9: 

(i) :-;p:q!)ence C1IARA(:TL^RI^;TIC:^: 

(A) LENGTH: i 0 ha.-.o p.iirs 

(B) TYPE: NUCLEIC ACID 

(C) STRANOEDNEi;::;: :1INGLE 

(D) TOPOLOGY: LINEAR 

(j.i) MOLECULE TYPE: DNA 

(v.i) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing olicjo 99-2378-200 . nus2 
(D) LOCATION: 1..19 

SEQUENCE DESCRIPTION: SEQ ID NO: 279: 
GGACGACAGG ACAGTTCTA 19 



(2) INFORMATION FOR SEQ ID NO: 280: 

(i.) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-238 1-394 . mis2 

(B) LOCATION: 1,.19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 280: 
TTTAGCTCCC CTACTTTTC 19 



(2) INFORMATION FOR SEQ ID NO: 281: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) 5TRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) OKIGINAL GOURCE: 

(A) ORGANIGM: Homo sapicnr, 

(ix) KKATURE: 

(A) NAME/KEY: micro:;oqLienciruj oiiqo *)*)-:-M IJ-.HiH . m 

(B) LOCATION: l..iy 

(xl) GEOUENCE DESCRIPTION: SEQ ID NO: 2111: 

r- 

h.hmKT\KXQ. TACAATTCT 1 tj 

{'!) INFORMATION EOR GEO ID NO: 2l32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B} TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
(Dl TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencinq oiiqo 'jy- 2^ 1 5 . mis2 

(D) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2B2: 
AGTCACAGCT CCCTGGAGGG TGG 2 3 

(2) INFORMATION FOR SEQ ID NO: 283: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 
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(A) NAME/KEY: rnicrosequencing oligo 99-2559-253 . mis2 

(B) LOCATION: 1 . . 19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 283: 



CAI'GGATGAT GTGACACAC 



(>:) INTORMATION FOR SEQ ID NO: 2H4: 

(L) SROUENCE CIIARACTERI£:TICS: 

(A) LENGTH: 19 ba:3e pair:s 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNE55: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: rnicrosequencing oligo 99-25c-o- 1 12 . mis2 

(B) LOCATION: 1..19 

(>;i) SEQUENCE DESCRIPTION: SEQ ID NO: 28^: 



AGf.GTGGCCA AGCTCCTTC 



(2) INFORMATION FOR SEQ ID NO: 285: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: rnicrosequencing oligo 99-2567-329 .mis2 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 285: 



TATAGCCCAA AGAAAGCCA 



19 
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(2) INTORMATION FOR SEQ ID NO: 28 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 
(11) TYPE: NUCLEIC ACID 
(C:) STRANDEDNEr:S: SINGLE 
(D) TOrOLOGY: LINEAR 

(Li) MOLECULE TYPE: UNA 

(vL) ORIGINAL SOURCE: 

(A) ORGANISM: Homo japioru 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencintj oliqo 90- 2070-2 1 0 . iiiiii2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 6: 



AACTTAGCCA CTTCAGAGGC CTC 



(A) INFORMATION FOR SEQ ID NO: 287: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-257 1-242 . mis2 

(B) LOCATION: 1..19 

(:<i) SEQUENCE DESCRIPTION; SEQ ID NO: 287: 



GCTGACACAT TTAATTATA 



(2) INFORMATION FOR SEQ ID NO: 288: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(Ix) FEATURE: 

(A) NAME/KEY: pot<Miti.il microi^cquon.c inq olicfo *>')-:^(n O- 1 2 1 . mi.:;;: 
(10 LOCATION: I.. 23 

(xi) i'lEQUENCE DESCRIPTION: SEQ ID NO: : 



c;ac;aattttc taac:tgcagc ata 



(2) INFORMATION FOR SEQ ID NO: 289: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(l:0 FEATURE: 

(A) NAME/KEY: potential microsequ'jr. -i::-! oiiqa f)0-2 C 1 0-9 3 . mi j2 

(B) LOCATION: 1..23 

(:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 289: 
TTCCCAGAAG ATGAGAATTT GCT 2 3 



(2) INFORMATION FOR SEQ ID NO: 290: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2620-227 . mis2 

(B) LOCATION: 1..23 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 290: 
TTTTAACACC CAGCAAGATA CCC 



(") INTORMATTON FOR SEQ ID NO: 291: 

(i) SEQUl:.NCE CHARACTERISTICS: 

(A) LENGTH: 23 baso pairs 
(n) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

MOLECULE TYPE: UNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: microsequencing olitjo 99-202.1 -^1 07 . mi:j2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 291: 
TTTTCTCTTT CCCCATCTCT CCC 



(2) INFORMATION FOR SEQ ID NO: 292: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-262S-70 . mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 292: 
TTTTCTCTCT TCKTCCTCTC TCC 



(2) INFORMATION FOR SEQ ID NO: 293: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESG: SINGLE 

(D) TOPOLOGY: LINEAR 

(il) MOLECULE TYPE: ONA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: tlomo sapiens 

(i.::) FEATURE: 

(A) NAME/KEY: microsequoncincj oiiyo 2630 ~ Ul ,u\i:r*2 
(D) LOCATION: 1..23 

y- 4"' 
(xi) :;E0UENCE description: SEQ ID NO: 293: 



TTTTACTCCC TGTTCTGGAC CAA 



(2) INFORMATION FOR SEQ ID NO: 294: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 
IB) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(Li) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing olicjo 99-2633- 129 . inis2 

(B) LOCATION: I.. 23 

(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 29^: 



TCAAGGGTTC TCTCATTGTC TAC 



(2) INFORMATION FOR SEQ ID NO: 295: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 
(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Homo sapiens 
iix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2ij3'l-3'n . mis2 

(fi) LOCATION: 1..23 

(xi) :;EQUENCE DESCRIPTION: CEO ID NO: 2!)r5: 
TTTTTAAATA ATGTCTCACC TGT 



(2) INFORMATION FOR GEO ID NO: 29G: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 
{C) STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i;-:) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2637-213 . mis2 
(D) LOCATION: 1..23 

(:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 29G: 
TTTTAAAACC CACCCTCCTT TGA 



(2) INFORMATION FOR SEQ ID NO: 297: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY; LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-264 2-255 . mis2 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297: 
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TCACTTCAGA TTCAAATGC 19 



C::) INFORMATION FOR SEQ ID NO: 298: 

(i) SEODENCE CHARACTERISTICS: 

(A) LENGTH: 23 bnso pairs 
in) TYPE: NUCLEIC ACID 

(C) STRANDEDNES3: SINGLE 

(D) TOrOLOGY: LINEAR 

(li) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Momo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 5- 118 . rnis2 

{R) LOCATION: 1..23 

SEQUENCE DESCRIPTION: SEQ ID NO: 290: 
TTTTCAGATT CTTCATTGCT AGO 23 



(2) INFORMATION FOR SEQ ID NO: 299: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2 64 7-368 .mis2 

(B) LOCATION: 1, . 19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 299: 
AGATAATGTG AGTGGGCCT 19 



(2) INFORMATION FOR SEQ ID NO: 300: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 
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(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL i^OURCE: 

(A) ORGANISM: lioino sapiens; 

(ix) reATURE: 

(A) NAME/KEY: potontial micrOiJoqiiencirK] oliijo ')')-2til 1 07 . m i ji^ 

(B) LOCATION: I.. 23 

(:<i) .SKOUENCE DEiiCRI PTION : lU^O ID NO: 3l)0: 



AGTTTCAGTG CATTGCTGTC CTG 



(2) INFORMATION FOR SEQ ID NO: 301: 

(i) CEQUENCE CHARACTERISTICS: 

(A) LENGTH: ^7 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(VL) ORIGINAL SOURCE: 

(A) ORGANISM: [lomo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-3^^1 
(D) LOCATION: 1,.47 

(i>:) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-344-misl 

(B) LOCATION: 1,.23 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-344-mis2 

(B) LOCATION: complement 25.. 43 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 301: 



TGCTGCCAAG GATCCATGTC AGCATGCTCC TCTCTGAGCC CTGGTCT 



(2) INFORMATION FOR SEQ ID NO: 302: 



wo 99/04038 



153 



PCT/IB98/01193 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNEGH: SINGLE 
(0) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(Ix) FEATURE: 

(A) NAME/KEY: polymorptuc t"racjmont 9*.)-.i()0 

(B) LOCATION: 1../17 

(i:%) FEATURE: 

(A) NAME/KEY: polymorphic base 

(D) LOCATION: 2 A 

(0) OTHER INFORMATION: base t 

(i:-:) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-366-mi3l 

(B) LOCATION: 5.. 23 

(i:-:) FEATURE: 

(A) NAME/KEY: PoUentiJii microsequencing oligo 90- 3Gb-r:iis2 
(D) LOCATION: complement 25.. 4 7 

(:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 302: 



AGGGCCTGGC TTCAGGGACA CCTTAGGAA>\ TGTTTGTTGA GTTAGTG 0 7 



{2) INFORMATION FOR SEQ ID NO: 303: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-359 

(B) LOCATION: 1..47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION; 24 

(D) OTHER INFORMATION: base g 



wo 99/04038 



PCT/IB98/01193 



154 



(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oiigo 99-359-misl 
(D) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99- Jr)9-mi;;>i 
(H) LOCATION: complement 25.. 43 

(>:i) OEOUENCE DESCRIPTION: SEQ ID NO: 303: 
CTACACACTC ATCCCGTCCA TCCCGTCTCA ACAAATCCTG GCAGCTC 47 

(2) INFORMATION FOR SEQ ID NO: 304: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 
^(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(11) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-3r)S 

(B) LOCATION: 1..47 

(i:-:) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-355-misl 

(B) LOCATION: 1,.23 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-355-mis2 

(B) LOCATION: complement 25., 43 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 304: 
GGAGTTTCGG GGAGTTTCGG GAGGGTTCCT GGGAAGAAGC TCCTCCC 4 7 



(2) INFORMATION FOR SEQ ID NO: 305: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 
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(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo tjapions 

(ix) l^EATURE: 

(A) NAME/KEY: polymorphic fragment 90-360 
iU) LOCATION: 1 . . ^0 

(ix) FF.ATURE: 

(A) NAME/KEY:, polymorphic base 

(B) LOCATION: 2A 

(D) OTHER INFORMATION: base c 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-365-::iisl 

(B) LOCATION: I3..23 

(i:-:) FEATURE: 

(A) NAME/KEY: Potential microsequencincj oli.jo 99-3G5-mis2 
(D) LOCATION: complement 25.. -10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 305: 
CCTACCAACC AAGCAGCCCC AGCCTAGGGT CAGACAGGGT GAGCCTC 4 7 



(2) INFORMATION FOR SEQ ID NO: 306: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-24 52 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: Extracted from sequence gb:M10065 

(3909. .3955) 

(ix) FEATURE: 

(A) NAME/KEY; polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 
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(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2452-mi5l 

(B) LOCATION: 5.. 23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequcncincj oli<jo *.)*)-2^ 52-mis2 

in) LOCATION: complement 25.. 47 

(xi) .'.SEQUENCE DESCRIPTION: GEO ID NO: 306: 



TC;CGCGCGGA CATGGAGGAC GTGCGCGGCC GCCTGGTGCA GTACCCC 4 7 



{2) INFORMATION FOR SEQ ID NO: 307: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: Al base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-344 
(U) LOCATION: 1..4 7 

(0) OTIIER INFORMATION: variant version of SEQ ID301 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base g; a in SEQ ID301 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-344-misl 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-344-mis2 

(B) LOCATION: complement 25.. 43 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 307: 



TGCTGCCAAG GATCCATGTC AGCGTGCTCC TCTCTGAGCC CTGGTCT 4 7 



(2) INFORMATION FOR SEQ ID NO: 308: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 
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(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:<) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-:i6(; 

(B) LOCATION: 1. .47 

(U) OTHER INFORMATION: varianL version ot iM'.Q XUJ02 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 2 4 

(D) OTHER INFORMATION: base c; t in SEQ ID302 

(i:<) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-366-misl 
(D) LOCATION: 5.. 23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-3G6-mis2 

(B) LOCATION: complement 25.. 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 308: 



ACGGCCTGGC TTCAGGGACA GCTCAGGAAA TGTTTGTTGA GTTAGTG 4 7 



[2) INFORMATION FOR SEQ ID NO: 309: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-359 

(B) LOCATION: 1. .47 

(D) OTHER INFORMATION: variant version of SEQ ID303 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base a; g in SEQ ID303 
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(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oliqo 99-359-misi 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-339-mii;2 

(B) LOCATION; complement 25.. ^3 

(xi) STOUENCE DESCRirTION: HEQ ID NO: 300: 
CTACAC^AGTC A'l*CCCCTCCA TCCAGTCTCA ACAAATCCTG CCAGCTC Al 

(2) INFORMATION FOR 5EQ ID NO: 310: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: Al base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A} ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-355 

(B) LOCATION: 1../17 

(0) OTHER INFORMATION: variant version of SEQ ID30^ 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 2 4 

(D) OTHER INFORMATION: base a; g in SEQ ID30^ 
(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-355-misl 

(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-355-mis2 

(B) LOCATION: complement 25.. 43 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 310: 
GGAGTTTCGG GGAGTTTCGG GAGAGTTCCT GGGAAGAAGC TCCTCCC 4 7 



(2) INFORMATION FOR SEQ ID NO: 311: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 8 base pairs 



wo 99/04038 



159 



PCT/IB98/01193 



(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi.) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 90- 3(513 

(B) LOCATION: 

(D) OTHER INFORMATION : variant version oL 'S.ZQ ID30S 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 
(D) LOCATION: 2A 

(D) OTHER INFORMATION: base t; c in SEQ ID335 

(i:-:) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-365-nasl 
(D) LOCATION: 5.. 23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oli.jo 99-36!>-mis2 

(B) LOCATION: complement 25.. 48 

(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 311: 
CCTACCAAGC AAGCAGCCCC AGCTTAGGGT CAGACAGGGT GAGCCTC 4 7 



(2) INFORMATION FOR SEQ ID NO: 312: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-2452 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID306 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base t; c in SEQ ID306 
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(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-24 52-mis 1 

(B) LOCATION: 5.. 23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo *)9-2^ 52-mis2 
LOCATION: complement 25.-^7 

(xL) SEQUENCE DESCRIPTION: GEO ID NO: 312: 
TCCGCGCCIGA CATGGAGGAC GTGTGCGGCC GCCTGGTGCA GTACCCX: A'J 

(2) INFORMATION FOR SEQ ID NO: 313: ' 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNES5: SINGLE 
(0) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix| FEATURE: 

(A) NAME/KEY: upstream amplification primer Lor SEQ ID3U1 and SEo 

10307 

(B) LOCATION: 1..20 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 313: 

GCTCTCATAT TCATTGGGTG 20 

(2) INFORMATION FOR SEQ ID NO: 314: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID302 and SEQ 

ID308 

(B) LOCATION: 1,.18 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31^: 



TCTCTCCCGT GTTAAATG 



CO ENTORMATION FOR SEQ ID NO: 315: 

(i) i^EOUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(U) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Morno sapiens 

(i>:) FEATURE: 

(A) NAME/KEY: upstream amplification oriiner tor SEQ ID303 and SEO 

ID309 

(D) LOCATION: 1 . . 18 
(:<i) SEQUENCE DESCRIPTION: SEQ ID NO: 315: 



AATCTTCTTG CTCCTGTC 



(2) INFORMATION FOR SEQ ID NO: 316: 

{X) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID304 and SEQ 

ID310 

(D) LOCATION: 1..18 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 316: 



AGGTTAGGGG TGTATTTC 



18 
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(2) INFORMATION FOR SEQ ID NO: 317: 

(i) SEQUENCE CHARACTERISTICS: 

CA) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo s:ipion:3 

, {i:<) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID305 and SEf^ 

ID311 

(D) LOCATION: 1 . . 18 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 317: 

AGACTGTGAC CTTAGACC 19 

(2) INFORMATION FOR SEQ ID NO: 318: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID306 and SEQ 

ID312 

(B) LOCATION: 1..18 

(D) OTHER INFORMATION: Extracted from sequence gb:M10065 

(3791. .3808) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 318: 

GACGAGACCA TGAAGGAG 18 

(2) INFORMATION FOR SEQ ID NO: 319: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 
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(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo r.cipicns 

(ix) l-^EATURE: 

(A) NAME/KEY: dowrj:; t ream amplification primer t=or SEO XD301 ami 

:;ii:o id307 

(B) LOCATION: I.. 10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 319: 

TCGCTGCGGT TAGATGCTC 1 9 



(2) INFORMATION FOR SEQ ID NO: 320: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: .18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification prir.or for SEQ ID302 and 

SEQ ID308 

(B) LOCATION: 1 . . 18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 320: 

AGGGGTAACT CTTGATTG 18 



(2) INFORMATION FOR SEQ ID NO: 321: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Homo sapiens 
(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for PEQ 10303 nnd 

SEQ 10309 

(D) LOCATION: i.,18 
(xi) ;>EOUENCE DESCRirTION: GEQ ID NO: 321: 

ACCAACCCAT AGCTTCTC 1 



{2) INFORMATION-. FOR SEQ ID NO: 322: 

(i) :JE0UENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primor for SEQ ID304 and 

::E0 ID3iO 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 322: 



ATACAGCCAG GGAGATAG 



(2) INFORMATION FOR SEQ ID NO: 323: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: downstream amplification primer for SEQ ID305 and 

SEQ ID311 

(B) LOCATION: 1..18 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 323: 



AATTGCTACC CCCAATTC 



{2) tNFOUMATION FOR SEQ ID NO: 324: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 bose pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNEGS: SINGLE 

(D) TOPOLOGY: LINEAR 

j"- >*' 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo s.^piens 

{ ix) FEATURE: 

(A) NAME/KEY: ciowrist ream amplification primer for SEO ID30G and 

SEQ ID312 ' 

(B) LOCATION: 1..18 

(D) OTHER INFORMATION: Extracted fron SGquence cjb:M1006rj 
(complement 4 37 0.. 4 395) 

{:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 324: 



TCGAACCAGC TCTTGAGG 



(2) INFORMATION FOR SEQ ID NO: 325: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-344. misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 325: 



TGCTGCCAAG GATCCATGTC AGC 



23 
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(2) INFORMATION FOR SEQ ID NO: 326: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 bas<3 pairs 

(B) TYPE: NUCLEIC ACID 
CO STRANDEDNESS: SINGLE 
(D) TOrOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM; Homo iiapion.s 

(ix) FEATURE: 

(A) NAME/KEY; microsequencing oiigo 99-366. rnisl 

(B) LOCATION: 1..19 

SEQUENCE DESCRIPTION: SEQ ID NO: 326: 



CCTGGCTTCA GGGACAGCT 19 



(2) INFORMATION FOR SEQ ID NO: 327: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 
(C:) STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

{ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oiigo 99-359. misl 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 327: 



CTACAGAGTC ATCGCCTCCA TCC 2 3 



(2) INFORMATION FOR SEQ ID NO: 328; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY; LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) EEATURE: 

(A) NAME/ KEY: pof.onl.ial micror»cqiioncinq olitio 3')r) . m L:; 1 

(B) LOCATION: 1. .23 

(xi) i;EQUENCE DESCRIPTION: SEQ ID NO: 32B: 

(;c;actttcgc ggagtttcgg gag 2 3 

(2) INFORMATION FOR SEQ ID NO: 329: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: flomo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencinq oiiqo ^)')- 3(*)r> , mi I 

(B) LOCATION: 1..19 

(:-:i) SEQUENCE DESCRIPTION: SEQ ID NO: 329: 
CCAAGCAAGC AGCCCCAGC 19 

(2) INFORMATION FOR SEQ ID NO: 330: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-2452. misl 

(B) LOCATION: 1..19 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 330: 
CGCGGACATG GAGGACGTG 



CD INFORMATION TOR SEQ ID NO: 331: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-31 mis2 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 331: 
CAGGGCTCAG AGAGGAGCA 



(2) INFORMATION FOR SEQ ID NO: 332: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: .SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-366. mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 332: 
CACTAACTCA ACAAACATTT CCT 



(2) INFORMATION FOR SEQ ID NO: 333: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 
(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) t-'EATURE: 

(A) NAME/KEY: microsequoncing oligo 99-35^K rnii;!' 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 333: 



TGCCAGGATT TGTTGAGAC 



(2) INFORMATION FOR SEQ ID NO: 334: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(D) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(VL) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: microsequencing oligo 99-355. mis2 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 334: 



GGAGCTTCTT CCCAGGAAC 



(2) INFORMATION FOR SEQ ID NO: 335: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 
(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Homo sapiens 
(ix) FEATURE: 

(A) NAME/KEY: potential microsequencing oliqo 99-36S.mis2 

(B) LOCATION: 1..23 

(xi) :;EQUENCE DESCRIPTION: SEQ ID NO: 335: 



GAGGCTCACC CTGTCTGACC CTA 23 



C!) INFORMATION FOR SEQ ID NO: 336: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(i:-:) FEATURE: 

(A) NAME/KEY: potential microsequencing oligo 99-2-1 52 . mis2 

(B) LOCATION: 1..23 

(:<i) r»EQUENCE DESCRIPTION: SEQ ID NO: 336: 



GCGGTACTGC ACCAGGCGGC CGC 



23 



